Nested Refinements for Dynamic Languages 



Ravi Chugh Patrick M. Rondon Ranjit Jhala 

University of California, San Diego 
{rchugh, prondon, jhala}@cs. ucsd.edu 



o . 

(N ■ 
CD ■ 

m ; 
in ■ 



H-l 

> 

in 
in 
o 
in 

cn 
o 



X 



Abstract 

Programs written in dynamic languages make heavy use of features 
— run-time type tests, value-indexed dictionaries, polymorphism, 
and higher-order functions — that are beyond the reach of type sys- 
tems that employ either purely syntactic or purely semantic reason- 
ing. We present a core calculus, System D, that merges these two 
modes of reasoning into a single powerful mechanism of nested re- 
finement types wherein the typing relation is itself a predicate in the 
refinement logic. System D coordinates SMT-based logical impli- 
cation and syntactic subtyping to automatically typecheck sophisti- 
cated dynamic language programs. By coupling nested refinements 
with McCarthy's theory of finite maps, System D can precisely rea- 
son about the interaction of higher-order functions, polymorphism, 
and dictionaries. The addition of type predicates to the refinement 
logic creates a circularity that leads to unique technical challenges 
in the metatheory, which we solve with a novel stratification ap- 
proach that we use to prove the soundness of System D. 

1. Introduction 

So-called dynamic languages like JavaScript, Python, Racket, and 
Ruby are popular as they allow developers to quickly put together 
scripts without having to appease a static type system. However, 
these scripts quickly grow into substantial code bases that would 
be much easier to maintain, refactor, evolve and compile, if only 
they could be corralled within a suitable static type system. 

The convenience of dynamic languages comes from their sup- 
port of features like run-time type testing, value-indexed finite 
maps (i.e. dictionaries), and duck typing, a form of polymorphism 
where functions operate over any dictionary with the appropriate 
keys. As the empirical study in fl3ll shows, programs written in dy- 
namic languages make heavy use of these features, and their safety 
relies on invariants which can only be established by sophisticated 
reasoning about the flow of control, the run-time types of values, 
and the contents of data structures like dictionaries. 

The following code snippet, adapted from the popular Dojo 
Javascript framework 13111 . illustrates common dynamic features: 

let onto callbacks f obj = 
if f = null then 

new List (obj, callbacks) 
else 

let cb = if tag f = "Str" then obj [f ] else f in 
new List (fun () -> cb obj, callbacks) 

The function onto is used to register callback functions to be called 
after the DOM and required library modules have finished loading. 
The author of onto went to great pains to make it extremely 
flexible in the kinds of arguments it takes. If the obj parameter 
is provided but f is not, then obj is the function to be called 
after loading. Otherwise, both f and obj are provided, and either: 
(a) f is a string, obj is a dictionary, and the (function) value 
corresponding to key f in obj is called with obj as a parameter 



after loading; or (b) f is a function which is called with obj as a 
parameter after loading. To verify the safety of this program, and 
dynamic code in general, a type system must reason about dynamic 
type tests, control flow, higher-order functions, and heterogeneous, 
value-indexed dictionaries. 

Current type systems are not expressive enough to support the 
full spectrum of reasoning required for dynamic languages. Syntac- 
tic systems use advanced type-theoretic constructs like structural 
types H ], row types i33ll , intersection types lfl2ll . and union types 
IP, 13411 to track invariants of individual values. Unfortunately, such 
techniques cannot reason about value-dependent relationships be- 
tween program variables, as is required, for example, to determine 
the specific types of the variables f and obj in onto. Semantic 
systems like qO support such reasoning by using logical predicates 
to describe invariants over program variables. Unfortunately, such 
systems require a clear (syntactic) distinction between complex val- 
ues that are typed with arrows , typ e variables etc., and base values 
that are typed with predicates lllill8L 12711 . Hence, they cannot sup- 
port the interaction of complex values and value-indexed dictionar- 
ies that is ubiquitous in dynamic code, for example in onto, which 
can take as a parameter a dictionary containing a function value. 

Our Approach. We present System D, a core calculus that sup- 
ports fully automatic checking of dynamic idioms. In System D all 
values are described uniformly by formulas drawn from a decid- 
able, quantifier-free refinement logic. Our first key insight is that 
to reason precisely about complex values (e.g. higher-order func- 
tions) nested deeply inside structures (e.g. dictionaries), we require 
a single new mechanism called nested refinements wherein syntac- 
tic types (resp. the typing relation) may be nested as special type 
terms (resp. type predicates) inside the refinement logic. Formally, 
the refinement logic is extended with atomic formulas of the form 
x :: U where U is a type term, (read "has type") is a bi- 
nary, uninterpreted predicate in the refinement logic, and where 
the formula states that the value x "has the type" described by the 
term U. This unifying insight allows to us to express the invariants 
in idiomatic dynamic code like onto — including the interaction 
between higher-order functions and dictionaries — while staying 
within the boundaries of decidability. 

Expressiveness. The nested refinement logic underlying System 
D can express complex invariants between base values and richer 
values. For example, we may disjoin two tag-equality predicates 

{v I tag(is) = "Int" V tag(u) = "Str"} 

to type a value v that is either an integer or a string; we can then 
track control flow involving the dynamic type tag-lookup function 
tag to ensure that the value is safely used at either more specific 
type. To describe values like the argument f of the onto function 
we can combine tag-equality predicates with the type predicate. We 
can give f the type 

{v I v = null V tag(v) = "Str" V v :: Top -> Top} 



where Top is an abbreviation for {y \ true}, which is a type that 
describes all values. Notice the uniformity — the types nested 
within this refinement formula are themselves refinement types. 

Our second key insight is that dictionaries are finite maps, and 
so we can precisely type dictionaries with refinement formulas 
drawn from the (decidable) theory of finite maps COlBlll . In partic- 
ular, McCarthy's two operators — sel(x, a), which corresponds to 
the contents of the map x at the address a, and upd(x, a, v), which 
corresponds to the new map obtained by updating x at the address 
a with the value v — are precisely what we need to describe reads 
from and updates to dictionaries. For example, we can write 

{v\tag(v) = "Diet" A tag(sel(v, y)) = "Int"} 

to type dictionaries v that have (at least) an integer field y, where 
y is a program variable that dynamically stores the key with which 
to index the dictionary. Even better, since we have nested function 
types into the refinement logic, we can precisely specify, for the 
first time, combinations of dictionaries and functions. For example, 
we can write the following type for ob j 

{v | tag(f) = "Str" =>• sel(v,f) :: Top -> Top} 

to describe the second portion of the onto specification, all while 
staying within a decidable refinement logic. In a similar manner, 
we show how nested refinements support polymorphism, datatypes, 
and even a form of bounded quantification. 

Subtyping. The huge leap in expressiveness yielded by nesting 
types inside refinements is accompanied by some unique techni- 
cal challenges. The first challenge is that because we nest complex 
types {e.g. arrows) as uninterpreted terms in the logic, subtyping 
(e.g. between arrows) cannot be carried out solely via the usual syn- 
tactic decomposition into SMT queries 1, 27]. (A higher-order 
refinement logic would solve this problem, but that would preclude 
algorithmic checking; we choose the uninterpreted route precisely 
to relieve the SMT solver of higher-order reasoning!) We surmount 
this challenge with a novel decomposition mechanism where sub- 
typing between types, syntactic type terms, and refinement formu- 
las are defined inter-dependently, thereby using the logical struc- 
ture of the refinement formulas to divide the labor of subtyping 
between the SMT solver for ground predicates (e.g. equality, unin- 
terpreted functions, arithmetic, maps, etc.) and classical syntactic 
rules for type terms (e.g. arrows, type variables, datatypes, etc.). 

Soundness. The second challenge is that the inter-dependency 
between the refinement logic and the type system renders the stan- 
dard proof techniques for (refinement) type soundness inapplicable. 
In particular, we illustrate how uninterpreted type predicates break 
the usual substitution property and how nesting makes it difficult to 
define a type system that is well-defined and enjoys this property. 
We meet this challenge with a new proof technique: we define an 
infinite family of increasingly precise systems and prove soundness 
of the family, of which System D is a member, thus establishing the 
soundness of System D. 

Contributions. To sum up, we make the following contributions: 

• We show how nested refinements over the theory of finite maps 
encode function, polymorphic, dictionary and constructed data 
types within refinements and permit dependent structural sub- 
typing and a form of bounded quantification. 

• We develop a novel algorithmic subtyping mechanism that 
uses the structure of the refinement formulas to decompose 
subtyping into a collection of SMT and syntactic checks. 

• We illustrate the technical challenges that nesting poses to the 
metatheory of System D and present a novel stratification- 
based proof technique to establish soundness. 



• We define an algorithmic version of the type system with local 
type inference that we implement in a prototype checker. 
Thus, by carefully orchestrating the interplay between syntactic- 
and SMT-based subtyping, the nested refinement types of System 
D enable, for the first time, the automatic static checking of features 
found in idiomatic dynamic code. 

2. Overview 

We start with a series of examples that give an overview of our ap- 
proach. First, we show how by encoding types using logical refine- 
ments, System D can reason about control flow and relationships 
between program variables. Next, we demonstrate how nested re- 
finements enable precise reasoning about values of complex types. 
After that, we illustrate how System D uses refinements over the 
theory of finite maps to analyze value-indexed dictionaries. We 
conclude by showing how these features combine to analyze the 
sophisticated invariants in idiomatic dynamic code. 

Notation. We use the following abbreviations for brevity. 



Top{x) 


= true 


Int(x) 


= tag(x) = "Int" 


Bool(x) 


= tag(x) = "Bool" 


Str(x) 


= tag(x) = "Str" 


Dict(x) 


= tag(x) = "Diet" 


IorB(x) 


= Int(x) V Bool(x) 



We abuse notation to use the above as abbreviations for refine- 
ment types; for each of the unary abbreviations T defined above, 
an occurrence without the parameter denotes the refinement type 
{v\T{v)}. For example, we write Int as an abbreviation for 
{y | tag(y) = "Int"}. Recall that function values are also de- 
scribed by refinement formulas (containing type predicates). We 
often write arrows outside refinements to abbreviate the following: 

x:7WT 2 = {v\v ■.:x:T 1 ^T 2 } 
We write T\ — > T2 when the return type T2 does not refer to x. 

2.1 Simple Refinements 

To warm up, we show how System D describes all types through 
refinement formulas, and how, by using an SMT solver to discharge 
the subtyping (implication) queries, System D makes short work of 
value- and control flow-sensitive reasoning QUI]]. 

Ad-Hoc Unions. Our first example illustrates the simplest dy- 
namic idiom: programs which operate on ad-hoc unions. The func- 
tion negate takes an integer or boolean and returns its negation: 

let negate x = 

if tag x = "Int" then - x else not x 

In System D we can ascribe to this function the type 

negate :: IorB — > IorB 

which states that the function accepts an integer or boolean argu- 
ment and returns either an integer or boolean result. 

To establish this, System D uses the standard means of reason- 
ing about control flow in refinement-based systems j27ll . namely 
strengthening the environment with the guard predicate when pro- 
cessing the then-branch of an if-expression and the negation of the 
guard predicate for the else-branch. Thus, in the then-branch, the 
environment contains the assumption that tag(x) — "Int", which 
allows System D to verify that the expression — x is well-typed. 
The return value has the type \y \ tag(y) = "Int" A v = — x}. 
This type is a subtype of IorB as the SMT solver can prove that 



tag(f) = "Int" and v — — x implies tag(v) = "Int" V 
tag(v) = "Bool". Thus, the return value of the then-branch is 
deduced to have type IorB. 

On the other hand, in the else-branch, the environment contains 
the assumption -i(tag(x) = "Int"). By combining this with the 
assumption about the type of negate's input, tag(x) = "Int" V 
tag(x) — "Bool", the SMT solver can determine that tag(x) = 
"Bool" . This allows our system to type check the call to 

not :: Bool — )• Bool, 

which establishes that the value returned in the else branch has 
type IorB. Thus, our system determines that both branches return 
a value of type IorB, and thus that negate meets its specification. 

Dependent Unions. System D's use of refinements and SMT 
solvers enable expressive relational specifications that go beyond 
previous techniques 03153]. While negate takes and returns ad- 
hoc unions, there is a relationship between its input and output: the 
output is an integer (resp. boolean) iff the input is an integer (resp. 
boolean). We represent this in System D as 

negate :: x : IorB — )• {y \ tag(v) = tag(x)} 

That is, the refinement for the output states that its tag is the same 
as the tag of the input. This function is checked through exactly the 
same analysis as before; the tag test ensures that the environment in 
the then- (resp. else-) branch implies that x and the returned value 
are both Int (resp. Bool). That is, in both cases, the output value 
has the same tag as the input. 

2.2 Nested Refinements 

So far, we have seen how old-fashioned refinement types (where 
the predicates refine base values Q [TH |23l |27||) can be used to 
check ad-hoc unions over base values. However, a type system 
for dynamic languages must be able to express invariants about 
values of base and function types with equal ease. We accomplish 
this in System D by adding types (resp. the typing relation) to the 
refinement logic as nested type terms (resp. type predicates). 

However, nesting raises a rather tricky problem: with the typing 
relation included in the refinement logic, subtyping can no longer 
be carried out entirely via SMT implication queries We solve 
this problem with a new subtyping rule that extracts type terms 
from refinements to enable syntactic subtyping for nested types. 

Consider the function maybeApply which takes an integer x and 
a value f which is either null or a function over integers: 

let maybeApply x f = 

if f = null then x else f x 

In System D, we can use a refinement formula that combines a base 
predicate and a type predicate to assign maybeApply the type 

maybeApply :: Int — > {y \ v = null V v :: Int— >Ini\ — > Int 

Note that we have nested a function type as a term in the refine- 
ment logic, along with an assertion that a value has this particu- 
lar function type. However, to keep checking algorithmic, we use 
a simple first-order logic in which type terms and predicates are 
completely uninterpreted; that is, the types can be thought of as 
constant terms in the logic. Therefore, we need new machinery to 
check that maybeApply actually enjoys the above type, i.e. to check 
that (a) f is indeed a function when it is applied, (b) it can accept 
the input x, and (c) it will return an integer. 

Type Extraction. To accomplish the above goals, we extract the 
nested function type for f stored in the type environment as follows. 
Let F be the type environment at the callsite (f x) . For each type 
term U occurring in Y, we query the SMT solver to determine 
whether [T] =>■ f :: U holds, where [T] is the embedding of V into 



the refinement logic where type terms and predicates are treated 
in a purely uninterpreted way. If so, we say that U must flow to 
(or just, flows to) the caller expression f . Once we have found the 
type terms that flow to the caller, we map the type terms to their 
corresponding type definitions to check the call. 

Let us see how this works for maybeApply. The then-branch 
is trivial: the assumption that x is an integer in the environment 
allows us to deduce that the expression x is well-typed and has 
type Int. Next, consider the else-branch. Let Ui be the type term 
Int — > Int. Due to the bindings for x and f and the else-condition, 
the environment T is embedded as 

[T] = tag(x) = "Int" A (f = null V f :: Ui) A -i(f = null) 

Hence, the SMT solver is able to prove that T =>■ f :: U\. This 
establishes that f is a function on integers and, since x is known to 
be an integer, we can verify that the else-branch has type Int and 
hence check that maybeApply meets its specification. 

Nested Subtyping. Next, consider a client of maybeApply: 

let _ = maybeApply 42 negate 

At the call to maybeApply we must show that the actuals are 
subtypes of the formals, i.e. that the two subtyping relationships 

Ti h {v | v = 42} C Int 
Ti h {v | V = negate} C {v \ v - null V v :: Ui} (1) 

hold, where Ti = negate :{y\v :: Uo}, maybeApply :■■ ■ and 
Uo = x:IorB ->• {v\ tagiy) = tag(x)}. Alas, while the SMT 
solver can make short work of the first obligation, it cannot be used 
to discharge the second via implication; the "real" types that must 
be checked for subsumption, namely, Uo and Ui, are embedded as 
totally unrelated terms in the refinement logic! 

Once again, extraction rides to the rescue. We show that all sub- 
typing checks of the form T h \y \ p} IZ \y \ q} can be reduced to 
a finite number of sub-goals of the form: 

("type predicate-free") [T'] =>• p 

or ("type predicate" ) JT'] =>■ x :: U 

The former kind of goal has no type predicates and can be directly 
discharged via SMT. For the latter, we use extraction to find the 
finitely many type terms Ui fhat^tow to p' . (If there are none, the 
check fails.) For each Ui we use syntactic subtyping to verify that 
the corresponding type is subsumed by (the type corresponding to) 
U under V. 

In our example, the goal[T]reduces to proving either 

pi] v = null or pi] =>v::Ui 

where Ti = T\,v = negate. The former implication contains 
no type predicates, so we attempt to prove it by querying the SMT 
solver. The solver tells us that the query is not valid, so we turn 
to the latter implication. The extraction procedure uses the SMT 
solver to deduce that, under r[ the type term Uo flows into v. Thus, 
all that remains is to retrieve the definition of Uo and U\ and check 

f' h i: IorB — > {v \ tag(v) = tag(x)} C Int —t Int 

which follows via standard syntactic refinement subtyping ifTTIl . 
thereby checking the client's call. Thus, by carefully interleaving 
SMT implication and syntactic subtyping, System D enables, for 
the first time, the nesting of rich types within refinements. 

2.3 Dictionaries 

Next, we show how nested refinements allow System D to precisely 
check programs that manipulate dynamic dictionaries. In essence, 
we demonstrate how structural subtyping can be done via nested 



refinement formulas over the theory of finite maps @ [21]]. We 
introduce several abbreviations for dictionaries. 

Sel(x, y, z) = has(x, y) A sel(x, y) — z 
Fld(x,y, Int) = Dict(x) A Str(y) A has(x,y) A Int(sel(x,y)) 
Fld(x, y, U) = Dict(x) A Str(y) A has(x, y) A sel(x, y) :: U 

The last abbreviation states that the type of a field is a syntactic 
type term U (e.g. an arrow). 

Dynamic Lookup. SMT-based structural subtyping allows System 
D to support the common idiom of dynamic field lookup and up- 
date, where the field name is a value computed at run-time. Con- 
sider the following function: 

let getCount t c = 

if has t c then tolnt (t [c] ) else 

The function getCount uses the primitive operation 

has :: d: Diet — ¥ k: Str — > {u | Bool(i>) A v = true <4> has(d, k)} 

to check whether the key c exists in t. The refinement for the 
input d expresses the precondition that d is a dictionary, while 
the refinement for the key k expresses the precondition that k is 
a string. The refinement of the output expresses the postcondition 
that the result is a boolean value which is true if and only if d has a 
binding for the key k, expressed in our refinements using has(d, k), 
a predicate in the theory of maps that is true if and only if there is a 
binding for key k in the map d i2lll . 

The dictionary lookup t [c] is desugared to get t c where the 
primitive operation get has the type 

get :: d: Diet— >k : {v \ Str(y) A has(d, k)}— >{v \ v = sel(d, k)} 

and sel(d, k) is an operator in the theory of maps that returns the 
binding for key k in the map d. The refinement for the key k 
expresses the precondition that it is a string value in the domain 
of the dictionary d. Similarly, the refinement for the output asserts 
the postcondition that the value is the same as the contents of the 
map at the given key. 

The function getCount first tests the dictionary t has a binding 
for the key c; if so, it is read and its contents are converted to 
an integer using the function tolnt, of type Top^Int. Note that 
the if-guard strengthens the environment under which the lookup 
appears with the fact has(t, c), ensuring the safety of the lookup. 
If t does not contain the key c, the default value is returned. Both 
branches are thus verified to have type Int, so System D verifies 
that getCount has the type getCount :: Diet — > Str — > Int. 

Dynamic Update. Dually, to allow dynamic updates, System D 
includes a primitive 

set : : d : Diet — > k : Str — > x : Top 

{v I EqMod(v, d, k) A Sel(v, k, x)} 

where EqMod[d\ ,d2,k) abbreviates a predicate that stipulates that 
di is identical to d? at all keys except k. Thus, the set primitive 
returns a dictionary that is identical to d everywhere except that it 
maps the key k to x. The following illustrates how set can be used 
to update (or extend) a dictionary: 

let incCount t c = 

let newcount = 1 + getCount t c in 
let res = set t c newcount in res 

We give the function incCount the type 

d:Dict -5- c:Str -> {u \ EqMod(v,d, c) A Fld(v, c, Int)} 



The output type of getCount allows System D to conclude that 
newcount :: Int. From the type of set, System D deduces 

res :: \u \ EqMod(v, t, c) A Sel(u, c, newcount)} 

which is a subtype of the output type of incCount. Next, consider 

let dO = {"files" = 42 } 

let dl = incCount dO "dirs" 

let _ = dl ["files"] + dl["dirs"] 

System D verifies that 

dO :: {v\Fld{v, "files" , Int)} 

dl :: {v\Fld{v, "files", /nt) A Fld(u, "dirs" , Int)} 
and, hence, the field lookups return Ints that can be safely added. 

2.4 Type Constructors 

Next, we use nesting and extraction to enrich System D with data 
structures, thereby allowing for very expressive specifications. In 
general, System D supports arbitrary user-defined datatypes, but to 
keep the current discussion simple, let us consider a single type 
constructor List[T] for representing unbounded sequences of T- 
values. Informally, an expression of type List[T] is either a special 
null value or a dictionary with a "hd" key of type T and a "tl" 
key of type List[T]. As for arrows, we use the following notation 
to write list types outside of refinements. 

List[T] = {v\vv. List[T]} 

Recursive Traversal. Consider a textbook recursive function that 
takes a list of arbitrary values and concatenates the strings: 

let rec concat sep xs = 
if xs = null then "" else 
let hd = xs["hd"] in 
let tl = xs["tl"] in 

if tag hd != "Str" then concat sep tl 

else if tl ! = null then hd " sep " concat sep tl 

else hd 

We ascribe the function the type concat :: Str — > List[Top] —iStr. 
The null test ensures the safety of the "hd" and "tl" accesses and 
the tag test ensures the safety of the string concatenation using the 
techniques described above. 

Nested Ad-Hoc Unions. We can now define ad-hoc unions over 
constructed types by simply nesting List[-] as a type term in the 
refinement logic. The following illustrates a common Python idiom 
when an argument is either a single value or a list of values: 

let runTest cmd fail_codes = 
let status = syscall cmd in 
if tag fail_codes = "Int" then 

not (status = fail_codes) 
else 

not (listMem status fail_codes) 

Here, listMem :: Top— >List[Top]^-Bool and syscall :: Str^Int. 
The input cmd is a string, and f ail_codes is either a single inte- 
ger or a list of integer failure codes. Because we nest List[-] as a 
type term in our logic, we can use the same kind of type extraction 
reasoning as we did for maybeApply to ascribe runTest the type 

runTest :: Str —> {v \ Int(v) V v :: List[Int]} — > Bool 

2.5 Parametric Polymorphism 

Similarly, we can add parametric polymorphism to System D by 
simply treating type variables A, B, etc. as (uninterpreted) type 



terms in the logic. As before, we use the following notation to write 
type variables outside of refinements. 

A = {v | v :: A} 

Generic Containers. We can compose the type constructors in 
the ways we all know and love. Here is list map in System D: 

let rec map f xs = 

if xs = null then null 

else new List(f xs["hd"], map f xs["tl"]) 

(Of course, pattern matching would improve matters, but we are 
merely trying to demonstrate how much can be — and is! — 
achieved with dictionaries.) By combining extraction with the rea- 
soning used for concat, it is easy to check that 

map :: \/A,B. (A^B) -> List[A] -^List\B] 

Note that type abstractions are automatically inserted where a func- 
tion is ascribed a polymorphic type. 

Predicate Functions. Consider the list filter function: 

let rec filter f xs = 
if xs = null then null 

else if not (f xs["hd"]) then filter f (xs["tl"]) 
else new List (xs ["hd"] , filter f xs["tl"]) 

In System D, we can ascribe filter the type 

MA, B. (x: A -> {v | v = true =>• x :: B}) -s> List[A] ->• List[B], 

Note that the return type of the predicate, f, tells us what type 
is satisfied by values x for which f returns true, and the return 
type of filter states that the items filter returns all have the 
type implied by the predicate f . Thus, the general mechanism of 
nested refinements subsumes the kind of reasoning performed by 
specialized techniques like latent predicates l34ll . 

Bounded Quantification. Nested refinements enable a form of 
bounded quantification. Consider the function 

let dispatch d f = d[f] d 

The function dispatch works for any dictionary d of type A that 
has a key f bound to a function that maps values of type A to values 
of type B. We can specify this via the dependent signature 

WA,B. d:{v\Dict{v) A v :: A} -> {v \ Fld(d, u,A^B)} B 

Note that there is no need for explicit type bounds; all that is 
required is the conjunction of the appropriate nested refinements. 

2.6 All Together Now 

With the tools we've developed in this section, System D is now 
capable of type checking sophisticated code from the wild. The 
original source code for the following can be found in |Appendix C| . 

Unions, Generic Dispatch, and Polymorphism. We now have 
everything we need to type the motivating example from the in- 
troduction, onto, which combined multiple dynamic idioms: dy- 
namic fields, tag-tests, and the dependency between nested dictio- 
nary functions and their arguments. Nested refinements let us for- 
malize the flexible interface for onto given in the introduction: 

VA. callbacks : List[Top — > Top] 

-> f:{v\v = null V Str(u) V v :: A -> Top} 

-t obj : {1/ \ 1/ :: A A (f = null Top —5- Top) 

List[Top —> Top] 



Using reasoning similar to that used in the previous examples, 
System D checks that onto enjoys the above type, where the spec- 
ification for obj is enabled by the kind of bounded quantification 
described earlier. 

Reflection. Finally, to round off the overview, we present one last 
example that shows how all the features presented combine to allow 
System D to statically type programs that introspect on the contents 
of dictionaries. The function toXML shown below is adapted from 
the Python 3.2 standard library's plistlib.py t32ll : 

let rec toXML x = 

if tag x = "Bool" then 

if x then element "true" null 

else element "false" null 

else if tag x = "Int" then 

element "integer" (intToStr x) 
else if tag x = "Str" then 

element "string" x 
else if tag x = "Diet" then 

let ks = keys x in 

let vs = map {v| Str(v) and has(x,v)} Str 

(fun k -> element "key" k " toXML x[k]) ks in 
"<data>" " concat "\n" vs " "</data>" 
else element "function" null 

The function takes an arbitrary value and renders it as an XML 
string, and illustrates several idiomatic uses of dynamic features. If 
we give the auxiliary function intToStr the type Int — > Str and 
element the type Str — > {y \ v = null V Str(v)} -^r-Str, we can 
verify that 

to XML :: Top Str 

Of especial interest is the dynamic field lookup x[k] used in the 
function passed to map to recursively convert each binding of the 
dictionary to XML. The primitive operation keys has the type 

keys :: d:Dict — ¥ List[{u | Str{u) A has(d, v)}] 

that is, it returns a list of string keys that belong to the input dictio- 
nary. Thus, ks has type List[{v \ Str(v) A has(x, v)}], which en- 
ables the call to map to typecheck, since the body of the argument 
is checked in an environment where k :: {u \ Str(u) A has(x, v)}, 
which is the type that A is instantiated with. This binding suffices to 
prove the safety of the dynamic field access. The control flow rea- 
soning described previously uses the tag tests guarding the other 
cases to prove each of them safe. 

3. Syntax and Semantics 

We begin with the syntax and evaluation semantics of System D. 
|Figure l| shows the syntax of values, expressions, and types. 

Values. Values w include variables constants, functions, type 
functions, dictionaries, and records created by type constructors. 
The set of constants c include base values like integer, boolean, and 
string constants, the empty dictionary {}, and null. Logical values 
Iw are all values and applications of primitive function symbols F, 
such as addition + and dictionary selection sel, to logical values. 
The constant tag allows introspection on the type tag of a value at 
run-time. For example, 

tag(3) = "Int" tag(true) = "Bool" 
tag("joe") = "Str" tag(Xx.e) = "Fun" 
tag({}) = "Diet" tag(AA e) = "TFun" 

Dictionaries. A dictionary wi ++ {ui2 >-> ws} extends the dictio- 
nary wi with the binding from string W2 to value W3. For example, 
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Figure 1. Syntax of System D 



the dictionary mapping "x" to 3 and "y" to true is written 

{} ++ {"x" h-> 3} ++ {"y" ^ true}. 

The set of constants also includes operations for extending dictio- 
naries and accessing their fields. The function get is used to access 
dictionary fields and is defined 

get (w ++ {"x" M> w x }) "x" = w x 

get (w ++ {"y" n> w y }) "x" = get w "x" 

The function has tests for the presence of a field and is defined 

has (w ++ {"y" H> w y}) " x " — has w "x" 
has (to ++ {"x" i — ^ w x }) "x" = true 
has {} "x" = false 
The function set updates the value bound to a key and is defined 
set d k w = d ++ {k n> w} 

Expressions. The set of expressions e consists of values, function 
applications, type instantiations, if-then-else expressions, and let- 
bindings. We use an A-normal presentation so that we need only 
define substitution of values (not arbitrary expressions) into types. 



Types. We stratify types into monomorphic types T and polymor- 
phic type schemes VA S. In System D, a type T is a refinement 
type of the form \y \p}, where p is a refinement formula, and is 
read "v such that p." The values of this type are all values w such 
that the formula p[w/v] "is true." What this means, formally, is 
core to our approach and will be considered in detail in lsection~5l 

Refinement Formulas. The language of refinement formulas in- 
cludes predicates P, such as the equality predicate and dictionary 
predicates has and sel, and the usual logical connectives. For ex- 
ample, the type of integers is {y \ tag(v) — "Int"}, which we 
abbreviate to Int. The type of positive integers is 

{v | tag(v) = "Int" A v > 0} 

and the type of dictionaries with an integer field "f " is 

{v \ tag(v) = "Dict"A/ias(>, "f")Atag(sel(v, "f")) = "Int"}. 

We refer to the binder v in refinement types as "the value variable." 

Nesting: Type Predicates and Terms. To express the types of val- 
ues like functions and dictionaries containing functions, System D 
permits types to be nested within refinement formulas. Formally, 
the language of refinement formulas includes a form, Iw :: U, 
called a type predicate, where U is a type term. The type term 
x : Ti — > Ti describes values that have a dependent function type, 
i.e. functions that accept arguments w of type Ti and return values 
of type T2 [w/x], where x is bound in T2 . We write Ti — > T2 when 
x does not appear in T2. Type terms A, B, etc. correspond to type 
parameters to polymorphic functions. The type term Null corre- 
sponds to the type of the constant value null. The type term C[T] 
corresponds to records constructed with the C type constructor in- 
stantiated with the sequence of type arguments T. For example, the 
type of the (integer) successor function is 

{v I v :: x: Int — > {v \ tag{v) = "Int" A v = x + 1}}, 

dictionaries where the value at key "f " maps Int to Int have type 

{u I tag{y) = "Diet" A has(v, "f)Asd(v, "f") :: Int -> Int}, 

and the constructed record List(l, null) can be assigned the type 
{y I v :: List[Int]}. 

Datatype Definitions. A datatype definition of C defines a named, 
possibly recursive type. A datatype definition includes a sequence 
OA of type parameters A paired with variance annotations 9. A 
variance annotation is either + (covariant), - (contravariant), or = 
(bivariant). The rest of the definition specifies a sequence f:T of 
field names and their types. The types of the fields may refer to 
the type parameters of the declaration. A well-formedness check, 
which will be described in lsection~4l ensures that occurrences of 
type parameters in the field types respect their declared variance 
annotations. By convention, we will use the subscript i to index 
into the sequence 9 A and j for / : T. For example, 9i refers to the 
variance annotation of the i th type parameter, and fj refers to the 
name of the j" 1 field. 

Programs. A program is a sequence of datatype definitions td 
followed by an expression e. Requiring all datatype definitions to 
appear first simplifies the subsequent presentation. 

Semantics. The small-step operational semantics of System D 
is standard for a call-by- value, polymorphic lambda calculus; we 
provide the formal definition in |Appendix A| . Following standard 
practice, the semantics is parametrized by a function S that assigns 
meaning to primitive functions c, including dictionary operations 
like has, get, and set. 
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4. Type Checking 

In this section, we present the System D type system, comprising 
several well-formedness relations, an expression typing relation, 
and, at the heart of our approach, a novel subtyping relation which 
discharges obligations involving nested refinements through a com- 
bination of syntactic and semantic, SMT-based reasoning. We first 
define environments for type checking. 

Environments. Type environments F are of the form 

T ::= | T,x:S | T,A | T,p 

where bindings either record the derived type S for a variable x, 
a type variable A introduced in the scope of a type function, or a 
formula p that is recorded to track the control flow along branches 
of an if-expression. A type definition environment records the 
definition of each constructor type C. As type definitions appear at 
the beginning of a program, we assume for clarity that is fixed 
and globally visible, and elide it from the judgments. In the sequel, 
we assume that ^ contains at least the definition 

type List [+ A] { "hd" :{v \ v :: A}; "tl" :{v \ v :: List[A]}}. 
4.1 Well-formedness 

|Figure 2| defines the well-formedness relations. 

Formulas, Types and Environments. We require that types be 
well -formed within the current type environment, which means that 
formulas used in types are boolean propositions and mention only 
variables that are currently in scope. By convention, we assume that 
variables used as binders throughout the program are distinct and 
different from the special value variable u, which is reserved for 
types. Therefore, v is never bound in T, When checking the well- 
formedness of a refinement formula p, we substitute a fresh variable 



x for v and check that p[x / u] is well-formed in the environment ex- 
tended with x : Top, to the environment, where Top — {u \ true}. 
We use fresh variables to prevent duplicate bindings of v. 

Note that the well-formedness of formulas does not depend on 
type checking; all that is needed is the ability to syntactically distin- 
guish between terms and propositions. Checking that formulas are 
well-formed is straightforward; the important point is that a vari- 
able x may be used only if it is bound in T. 

Datatype Definitions. To check that a datatype definition is well- 
formed, we first check that the types of the fields are well-formed 
in an environment containing the declared type parameters. Then, 
to enable a sound subtyping rule for constructed types in the 
sequel, we check that the declared variance annotations are re- 
spected within the type definition. For this, we use a procedure 
VarianceOk (defined in |Appendix A| ) that recursively walks for- 
mulas to record whether type variables occur in positive or negative 
positions within the types of the fields. 

4.2 Expression Typing 

The expression typing judgment The :: S, defined in |Figure"3| 
verifies that expression e has type scheme S in environment T. We 
highlight the important aspects of the typing rules. 

Constants. Each primitive constant c has a type, denoted by 
ty(c), that is used by T-CONST. Basic values like integers, booleans, 
etc. are given singleton types stating that their value equals the cor- 
responding constant in the refinement logic. For example: 

1 :: {y\v — 1} true :: {y \ v = true} 

"joe" :: {y \ v — "joe"} false :: {u \ v — false} 

Arithmetic and boolean operations have types that reflect their 
semantics. Equality on base values is defined in the standard way, 
while equality on function values is physical equality. 

+ :: x : Int — > y : Int — > \y \ Int(u) A v = x + y} 
not :: x : Bool — > {v \ Bool(v) Ax — true <S> v — false} 

= :: x : Top — > y : Top —> {u \ Bool(v) A v = true <^ x = y} 
fix :: VA. (A ->• A) ->• A 
tag :: x : Top — > {v \ v — tag(x)} 

The constant fix is used to encode recursion, and the type for the 
tag-test operation uses an axiomatized function in the logic. 

The operations on dictionaries are given refinement types over 
the theory of finite maps. 

{} :: {u | u = empty} 

has :: d:Dict — > k : Str — > \y \ Bool(v) A v = true <S> has(d, k)} 

get :: d:Dict — > k:{u \ Str(v) A has(d, v)} — > \y \ v — sel(d, k)} 

set :: d: Diet — > k: Str — > x: Top 

— > {v | EqMod{u, d, k) A has(d, k) A sel(d, k) = x} 

keys :: d:Dict — > List[{v \ Str(v) A has(d, v)}] 

In the theory of finite maps, the operator dom(d) denotes the do- 
main of the map d, and restrict (d, y) restricts d to the set of keys 
y. (These primitives can all be reduced to McCarthy's select and 
update operators i20ll2Tll ; we define these in |Appendix A| ). Thus, 
we define empty as a special constant such that dom(empty) = 0. 
The refinements for the other operators use has(d, k), which abbre- 
viates k € dom(d), and EqMod(di, d 2 ,a), which abbreviates 

restrict(di, dom(di) \ {a}) — restrict(d 2 , dom(d 2 )\{a}) 

The predicate has(d,k) checks that a key k is defined in a 
map d, and is used as a precondition for get. The predicate 
EqMod(di, d 2 , k) states that the dictionaries d\ and d 2 are identi- 
cal except at the key k. This is useful for dictionary updates where 
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Figure 3. Type checking for System D 



we do not know the exact value being stored, but do know some ab- 
straction thereof, e.g. its type. For example, in incCounter (from 
Isection 2t we do not know what value is stored in the count field 
c, only that it is an integer. Thus, we say that the new dictionary 
is the same as the old except at c, where the binding is an integer. 
A more direct approach would be to use an existentially quanti- 
fied variable to represent the stored value and say that the resulting 
dictionary is the original dictionary updated to contain this quanti- 
fied value. Unfortunately, that would take the formulas outside the 
decidable quantifier-free fragment of the logic, thereby precluding 
SMT-based logical subtyping. 

Standard Rules. We briefly identify several typing rules that are 
standard for lambda calculi with dependent refinements. T-VAR 
and T-VarPoly assign types to variable expressions x. If x is 
bound to a (monomorphic) refinement type in T, then T-VAR as- 
signs x the singleton type that says that the expression x evaluates 
to the same value as the variable x. T-lF assigns the type scheme 
S to an if-expression if the condition to is a boolean- valued expres- 



sion, the then-branch expression e\ has type scheme S under the 
assumption that w evaluates to true, and the else-branch expres- 
sion e 2 has type scheme S under the assumption that w evaluates to 
false. The T-APP rule is standard, but notice that the arrow type 
of u>i is nested inside a refinement type. In T-LET, the type scheme 
S2 must be well-formed in T, which prevents the variable x from 
escaping its scope. T-SUB allows expression e to be used with type 
S if e has type S and S is a subtype of S. 

Type Instantiation. The T-TAPP rules uses the procedure Inst 
to instantiate a type variable with a (monomorphic) type. Inst is 
defined recursively on formulas, type terms, and types, where the 
only non-trivial case involves type predicates with type variables: 

lnst(Zw :: A,A,{v\p}) =p[lw/v] 
lnst(/w :: B, A, T) = Iw :: B 

We write lnst(S l , A,T) to mean the result of applying Inst to S 
with the type variables and type arguments in succession. 

Fold and Unfold. The T-FOLD rule is used for records of data 
created with the datatype constructor C and type arguments T. The 
rule succeeds if the argument Wj provided for each field fj has 
the required type Tj after instantiating all type parameters A with 
the type arguments T. If these conditions are satisfied, the formula 
returned by Fold(C, T, w), defined as 

v null Mag(y) = "Diet" A v :: C[T] A (Aj sel(u, f j ) = w j ) 

records that the value is non-null, that the values stored in the 
fields are precisely the values used to construct the record, and 
that the value has a type corresponding to the specific constructor 
used to create the value. T-UNFOLD exposes the fields of non-null 
constructed data as a dictionary, using Unfold(C, T), defined as 

v ± null Mtag{v) = "Diet" A ( A 3 - [T/] (seZ /,•)))) 



where *(C) = [9A]{f : T'}, l{v\p}}(lw) = p[lw/v], and for all 
j, T" = lnst(Tj', A, T). For example, Unfold (List, Int) is 

v null =>(tag(v) = "Diet" A tag(sel(v, "hd")) = "Int" 
A sel(u, "tl") :: List[Int]) 

4.3 Subtyping 

In traditional refinement type systems, there is a two-level hierar- 
chy between types and refinements that allows a syntax-directed 
reduction of subtyping obligations to SMT implications ITU Ql| 
l27ll . In contrast, System D's refinements include uninterpreted type 
predicates that are beyond the scope of (first-order) SMT solvers. 

Let us consider the problem of establishing the subtyping judg- 
ment F h \y I pi} C. {v\p 2 }. We cannot use the SMT query 



PI Api 



I>2 



(2) 



as the presence of (uninterpreted) type-predicates may conserva- 
tively render the implication invalid. Instead, our strategy is to mas- 
sage the refinements into a normal form that makes it easy to factor 
the implication in (O into a collection of subgoals whose conse- 
quents are either simple (non-type) predicates or type predicates. 
The former can be established via SMT and the latter by recursively 
invoking syntactic subtyping. Next, we show how this strategy is 
realized by the rules in |Figure"4| 

Step 1: Split query into subgoals. We start by converting p 2 into 
a normalized conjunction Ai(qt ^> n). Each conjunct, or clause, 
qi ^> n is normalized such that its consequent is a disjunction 
of type predicates. We use the symbol ^ instead of the usual 
implication arrow =>■ to emphasize the normal structure of each 
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above. The conversion is carried out by translating p to conjunctive 
normal form (CNF), and then for each CNF clause, rearranging lit- 
erals and adding negations as necessary. For example, 

Normalize^ = null) = -i(v = null) ^ false 

Normalize^ = null V v :: U) = ->{v = null) ^ v ::U 

Formula Implication. In each SMT implication query [r] Ap 
q, the operator [■] describes the embedding of environments and 
types into the logic as follows: 

HHP}] = P $T,x:T\ = IT] A [Tl [!/«/] 

[0] = true [r, I: VAS] = [r] 
[r,p] = [T]Ap [T, A] = IT] 

Recap. Recall that our goal is to typecheck programs which use 
value-indexed dictionaries which may contain functions as values. 
On the one hand, the theory of finite maps allows us to use logical 
refinements to express and verify complex invariants about the 
contents of dictionaries. On the other, without resorting to higher- 
order logic, such theories cannot express that a dictionary maps a 
key to a value of function type. 

To resolve this tension, we introduced the novel concept of 
nested refinements, where types are nested into the logic as unin- 
terpreted terms and the typing relation is nested as an uninterpreted 
predicate. The logical validity queries arising in typechecking are 
discharged by rearranging the formula in question into an impli- 
cation between a purely logical formula and a disjunction of type 
predicates. This implication is discharged using a novel combina- 
tion of logical queries, discharged by an SMT solver, and syntac- 
tic subtyping. This approach enables the efficient, automatic type 
checking of sophisticated dynamic language programs that manip- 
ulate complex data, including dictionaries which map keys to func- 
tion values. 



Figure 4. Subtyping for System D 

clause. By splitting p 2 into its normalized clauses, rule S-MONO 
reduces the goal <[2j» to the equivalent collection of subgoals 

Vi. T,pi h q { ^ n 

Step 2: Discharge subgoals. The normalization ensures that the 
consequent of each subgoal above is a disjunction of type predi- 
cates. When the disjunction of a clause is empty, the subgoal is 

( "type predicate-free" ) T,pi h qi ^ false 
which rule C- VALID handles by SMT. Otherwise, the subgoal is 

(" type predicate " ) T.pi h qi ^ luij :: Uj 

which rule C-ImpSyn handles via type extraction followed by an 
invocation of syntactic subtyping. In particular, the rule tries to 
establish one of the disjuncts Iwj :: Uj, by searching for a type 
term U that occurs in T that I) flows to Inij, i.e. for which we can 
deduce via SMT that 

PI Api Ag,^ lWj :: U 

is valid and, 2) is a syntactic subtype of Uj in an appropriately 
strengthened environment (written r,pi, gi h U <: Uj). The rules 
U-DATATYPE and U- ARROW establish syntactic (refinement) sub- 
typing, by (recursively) establishing that subtyping holds for the 
matching components |^,[TI1,|2^1. Because syntactic subtyping re- 
cursively refers to subtyping, the S-MONO rule uses fresh variables 
to avoid duplicate bindings of v in the environment. 

Formula Normalization. Procedure Normalize converts a for- 
mula p into a conjunction of clauses Ai(?i ^ Ti) as described 



5. Soundness 

At this point in the proceedings, it is customary to make a claim 
about the soundness of the type system by asserting that it enjoys 
the standard preservation and progress properties. Unfortunately, 
the presence of nested refinements means this route is unavailable 
to us, as the usual substitution property does not hold! Next, we 
describe why substitution is problematic and define a stratified 
system System D* for which we establish the preservation and 
progress properties. The soundness of System D follows, as it is 
a special case of the stratified System D*. 

5.1 The Problems 

The key insight in System D is that we can use uninterpreted 
functions to nest types inside refinements, thereby unlocking the 
door to expressive SMT-based reasoning for dynamic languages. 
However, this very strength precludes the usual substitution lemma 
upon which preservation proofs rest. 

Substitution. The standard substitution property requires that if 

i:S,rh e :: 5" and h w :: S, then T [w/x] h e[w/x] :: S'[w/x]. 
The following snippet shows why System D lacks this property: 

let foo f = in foo (fun x -> x + 1) 

Suppose that we ascribe to foo the type 

foo :: f:(Int — > Int) -> {v | / :: Int -¥ Int}. 

The return type of the function states that its argument / is a func- 
tion from integers to integers and does not impose any constraints 
on the return value itself. To check that foo does indeed have this 
type, by T-FUN, the following judgment must be derivable: 

/: Int -> Int h :: {v \ f :: Int -> Int} (3) 



By T-CONST, T-SUB, S-MONO and C-VALID the judgment re- 
duces to the implication 

true A / :: Int -»• Int A [tj/(0)] [0/v] =>■ f :: ihi -> Ait. 

which is trivially valid, thereby deriving (|3), and showing that f oo 
does indeed have the ascribed type. 

Next, consider the call to f oo. By T-APP, the result has type 

{v | (fun x -> x + 1) :: Int — > Int}. 

The expression f oo (fun x -> x + 1) evaluates in one step to 
0. Thus, if the substitution property is to hold, should also have 
the above type. In other words, System D must be able to derive 

h :: {v | (fun x -> x + 1) :: Int -» Int}. 

By T-CONST, T-SUB, S-MONO, and C-VALID, the judgment re- 
duces to the implication 

true A [ii/(0)]][0/V] => (fun x -> x + 1) :: Int Int (4) 

which is invalid as type predicates are uninterpreted in our refine- 
ment logic! Thus, the call to f oo and the reduced value do not have 
the same type in System D, which illustrates the crux of the prob- 
lem: the C-VALID rule is not closed under substitution. 

Circularity. Thus, it is clear that the substitution lemma will 
require that we define an interpretation for type predicates. As a 
first attempt, we can define an interpretation I that interprets type 
predicates involving arrows as: 

1 \= Xx. e :: x:Ti -> T 2 iff iiTihe :: T 2 . 

Next, let us replace C-VALID with the following rule that restricts 
the antecedent to the above interpretation: 



r h P : 



[C-Valid-Interpreted] 



Notice that the new rule requires the implication be valid in the 
particular interpretation I instead of in all interpretations. This al- 
lows the logic to "hook back" into the type system to derive types 
for closed lambda expressions, thereby discharging the problematic 
implication query in ©• While the rule solves the problem with 
substitution, it does not take us safely to the shore — it introduces 
a circular dependence between the typing judgments and the inter- 
pretation X. Since our refinement logic includes negation, the type 
system corresponding the set of rules outlined earlier combined 
with C-VALID-lNTERPRETED is not necessarily well-defined. 

5.2 The Solution: Stratified System D* 

Thus, to prove soundness, we require a well-founded means of in- 
terpreting type predicates. We achieve this by stratifying the inter- 
pretations and type derivations, requiring that type derivations at 
each level refer to interpretations at the same level, and that inter- 
pretations at each level refer to derivations at strictly lower levels. 
Next, we formalize this intuition and state the important lemmas 
and theorems. The full proofs may be found in | Appendix A| . 

Formally, we make the following changes. First, we index typ- 
ing judgments (h n ) and interpretations (I n ) with a natural number 
n. We call these the level-n judgments and inteipretations, respec- 
tively. Second, we allow level-n judgments to use the rule 

l n \= {TjAp^q 



Fh n p^q 



[C-VAL1D-N] 



and the level-?i interpretations to use lower-level type derivations: 

I n \=Xx.e::x:T 1 ^T 2 iff x:T± l- n _i e :: T 2 . 
Finally, we write 

rh»e::S iff 3n. F h n e :: S. 



The derivations in System D* consist of the derivations at all levels. 
The following "lifting" lemma states that the derivations at each 
level include the derivations at all lower levels: 

Lemma (Lifting Derivations). 

1. IfF h e :: S, then F h, e :: 5". 

2. 7/T h n e :: S, then F h n+1 e :: 5". 

The first clause holds since the original System D derivations 
cannot use the C-VALID-N rule, i.e. F h e :: S exactly when 
F ho e :: S. The second clause follows from the definitions of 
h n and T n . Stratification snaps the circularity knot and enables the 
proof of the following stratified substitution lemma: 

Lemma (Stratified Substitution). 

Ifx:S,T hn e :: S' and h n w :: 5", 
then F[w/x] h n +i e[w/x] :: S'[w/x]. 

The proof of the above depends on the following lemma, which 
captures the connection between our typing rules and the logical 
interpretation of formulas in our refinement logic: 

Lemma (Satisfiable Typing). 

If \-„ w :: T, thenln+i \= [T][iw/4 

Stratified substitution enables the following preservation result: 

Theorem (Stratified Preservation). 

If h n e :: S, and e <— » e then h n +i e' :: S. 

From this, and a separate progress result, we establish the type 
soundness of System D* : 

Theorem (System D* Type Soundness). 

If\-*e :: 8, then either e is a value or e =— ^ e and h* e' :: S. 

By coupling this with Lifting, we obtain the soundness of System 
D as a corollary. 

6. Algorithmic Typing 

Having established the expressiveness and soundness of System 
D, we establish its practicality by implementing a type checker 
and applying it to several interesting examples. The declarative 
rules for type checking System D programs, shown in Isection 4l 
are not syntax-directed and thus unsuitable for implementation. We 
highlight the problematic rules and sketch an algorithmic version 
of the type system that also performs local type inference |j25j]- 
The algorithmic system is sound with respect to the declarative 
one and, modulo a restriction to ensure that subtyping terminates, 
is as precise. Our prototype implementation jUy verifies all of the 
examples in this paper and in I34fl . using Z3 (3] to discharge SMT 
obligations. A more detailed discussion of the algorithmic system 
may be found in |Appendix B| . 

6.1 Algorithmic Subtyping 

Nearly all the declarative subtyping rules presented in |Figure"4| are 
non-overlapping and directed by the structure of the judgment be- 
ing derived. The sole exception is C-ImpSyn, whose first premise 
requires us to synthesize a type term U such that the SMT solver 
can prove Iwj :: U for some j, where U is used in the second 
premise. We note that, since type predicates are uninterpreted, the 
only type terms U that can satisfy this criterion must come from 
the environment F. Thus, we define a procedure MustFlow(T, T) 
that uses the SMT solver to compute the set of type terms U', out 
of all possible type terms mentioned in F, such that for all values 
x, x:T implies that x :: U' . To implement C-ImpSyn, we call 
MustFlow(r, {y | v = Iwj}) to compute the set U of type terms 
that might be needed by the second premise. Since the declarative 
rule cannot possibly refer to a type term U that is not in F, this 



strategy guarantees that U £ U and, thus, does not forfeit preci- 
sion. 

Ensuring Termination. An important concern remains: because 
we extract type terms from the environment and recursively invoke 
the subtyping relation on them, we do not have the usual guaran- 
tee that subtyping is recursively invoked on strictly syntactically 
smaller terms, and thus it is not clear whether subtyping checks 
will terminate. Indeed, they may not ! | Appendix B| presents an ex- 
ample obligation that, although unlikely to appear in practice, leads 
to non-termination when subtyping is implemented directly. The 
crux of the matter is that an inner subtyping obligation may be iso- 
morphic to an outer one, triggering an infinitely repeating deriva- 
tion. Fortunately, we can cut the loop as follows: along any branch 
of a subtyping derivation, we allow a type term to be returned by 
MustFlow at most once. Since there are only finitely many type 
terms in the environment, this is enough to ensure termination. The 
price we pay is that algorithmic subtyping is not complete with re- 
spect to declarative subtyping; we have not found and do not expect 
this to be a problem in practice. 

6.2 Bidirectional Type Checking 

We extend the syntax of System D with optional type annotations 
for binding constructs and constructed data, and, following work on 
local type inference l25ll . we define a bidirectional type checking 
algorithm. In the remainder of this section, we highlight the novel 
aspects of our bidirectional type system. 

Function Applications. To typecheck an application wi W2, we 
must synthesize a type T\ for the function Wi and use type ex- 
traction to convert T\ to a syntactic arrow. Since the procedure 
MustFlow can return an arbitrary number of type terms, we must 
decide how to proceed in the event that T\ can be extracted to mul- 
tiple different arrow types. To avoid the need for backtracking in the 
type checker, and to provide a semantics that is simple for the pro- 
grammer to understand, we synthesize a type for w\ only if there is 
exactly one syntactic arrow that is applicable to the given argument 

Remaining Rules. We will now briefly summarize some of the 
other algorithmic rules presented in | Appendix B| . Uses of T- 
SUB can be factored into other typing rules. However, uses of 
T-UNFOLD cannot, since we cannot syntactically predict where 
it is needed. Since we do not have pattern matching to determine 
exactly when to unfold type definitions, as in languages like ML, 
we eagerly unfold type definitions to anticipate all situations in 
which unfolding might be required. For let-expressions, to han- 
dle the fact that synthesized types might refer to variables that are 
about to go out of scope, making them ill-formed, we use several 
simple heuristics to eliminate occurrences of local variables. In all 
of the examples we have tested, the annotations provided on top- 
level let-bindings are sufficient to allow synthesizing well-formed 
types for all unannotated inner let-expressions. Precise types are 
synthesized for if-expressions by synthesizing the types of both 
branches, guarding them by the appropriate branch conditions, and 
conjoining them. For constructed data expressions, we allow the 
programmer to provide hints in type definitions that help the type 
checker decide how to infer type parameters that are omitted. For 
example, suppose the List definition is updated as follows: 

type List{+A]{"hd" :{v \ v :: A}- "tl" : {v \ v :: List[*A]}} 

Due to the presence of the marker * in the type of the "tl" field, 
local type inference will use the type of W2 to infer the omitted type 
parameter in List(u>i ,u>2). Finally, although the techniques in 1251 
would allow us to, for simplicity we do not attempt to synthesize 
parameters to type functions. 



Soundness. We write r h e <l S for the algorithmic type check- 
ing judgment, which verifies e against the given type S, and V h 
e l> 5* for the algorithmic type synthesis judgment, which pro- 
duces a type 5* for expression e. Each of the techniques employed 
in this section are sound with respect to the declarative system, so 
we can show the following property, where we use a procedure 
erase to remove type annotations from functions, let-bindings, and 
constructed data because the syntax of the declarative system does 
not permit them: 

Proposition (Sound Algorithmic Typing). 

IfT h e > S or T h e < S, then V h erase(e) :: S. 

7. Related Work 

In this section, we highlight related approaches to statically verify- 
ing features of dynamic languages. For a thorou gh i ntroduction to 
contract-based and other hybrid approaches, see 

Dynamic Unions and Control Flow. Among the earliest attempts 
at mixing static and dynamic typing was adding the special type 
dynamic to a statically-typed language like ML (2). In this ap- 
proach, an arbitrary value can be injected into dynamic, and a 
typecase construct allows inspecting its precise type at run-time. 
However, one cannot guarantee that a particular dynamic value is 
of one of a subset of types (cf. negate from Isection 2\ . Several 
researchers have used union types and tag-test sensitive control- 
flow analyses to support such idioms. Most recently, \tr ll34l] and 
As lfl3ll feature values of (untagged) union types that can be used 
at more precise types based on control flow. In the former, each ex- 
pression is assigned two propositional formulas that hold when the 
expression evaluates to either true or false; these propositions are 
strengthened by recording the guard of an if-expression in the typ- 
ing environment when typing its branches. Typechecking proceeds 
by solving propositional constraints to compute, for each value at 
each program point, the set of tags it may correspond to. The latter 
shows how a similar strategy can be developed in an imperative set- 
ting, by coupling a type system with a data flow analysis. However, 
both systems are limited to ad-hoc unions over basic and function 
values. In contrast, System D shows how, by pushing all the infor- 
mation about the value (resp. reasoning about flow) into expressive, 
but decidable refinement predicates (resp. into SMT solvers), one 
can statically reason about significantly richer idioms (related tags, 
dynamic dictionaries, polymorphism, etc.). 

Records and Objects. There is a large body of work on type 
systems for objects fnl |24ll . Several early advances incorporate 
records into ML ll26h . but the use of records in these systems are un- 
fortunately unlikely to be flexible enough for dynamic dictionaries. 
In particular, record types cannot be joined when they disagree on 
the type of a common field, which is crucially enabled by the use of 
the theory of finite maps in our setting. Recent work includes type 
systems for JavaScript and Ruby, presents a rich type system 
and inference algorithm for JavaScript, which uses row-types and 
width subtyping to model dictionaries (objects). The system does 
not support unions, and uses fixed field names. This issue is ad- 
dressed in O . which models dictionaries using row types labeled 
by singletons indexed by string constants, and depth subtyping. 
A recent proposal ||35t] incorporates an initialization phase during 
which object types can be updated. However, these systems pre- 
clude truly dynamic dictionaries, which require dependent types, 
and moreover lack the control flow analysis required to support 
ad-hoc unions. DRuby 1131 is a powerful type system designed to 
support Ruby code that mixes intersections, unions, classes, and 
parametric polymorphism. DRuby supports "duck typing," by con- 
verting from nominal to structural types appropriately. However, it 
does not support ad-hoc unions or dynamic dictionary accesses. 



Dependent Types and SMT Solvers. The observation that ad- 
hoc unions can be checked via dependent types is not new. fT^l 
develops a dependent type system called guarded types that is used 
to describe records and ad-hoc unions in legacy Cobol programs 
that make extensive use of tag-tests, where the "tag" is simply the 
first few bytes of a structure. lfl6ll presents an SMT-based system 
for statically inferring dependent types that verify the safety of ad- 
hoc unions in legacy C programs. 0| describes how type-checking 
and property verification are two sides of the same coin for C 
(which is essentially uni-typed.) It develops a precise logic -based 
type system for C and shows how SMT solvers can be used for 
type-checking. |3l uses refinement types to formalize similar ideas 
in the context of Dminor, a first-order functional data description 
language with fixed-key records and run-time tag-tests. The authors 
show how unions and intersections can be expressed in refinements 
(and even collections, via recursive functions), and hence how SMT 
solvers can wholly discharge all subtyping obligations. However, 
the above techniques apply only to first-order languages, with static 
keys and dictionaries over base values. 

Combining Decision Procedures. Our approach of combining 
logical reasoning by SMT solvers and syntactic reasoning by sub- 
typing is reminiscent of work on combining decision procedures 
123 . 12911 . However, such techniques require the theories being com- 
bined to be disjoint; since our logic includes type terms which 
themselves contain arbitrary terms, our theory of syntactic types 
cannot be separated from the other theories in our system, so these 
techniques cannot be directly applied. 

8. Conclusions and Future Work 

We have shown how, by nesting type predicates within refinement 
formulas and carefully interleaving syntactic- and SMT-based sub- 
typing, System D can statically type check dynamic programs that 
manipulate dictionaries, polymorphic higher-order functions and 
containers. Thus, we believe that System D can be a foundation 
for two distinct avenues of research: the addition of heterogeneous 
dictionaries to static languages like C#, Java, OCaml and Haskell, 
or dually, the addition of expressive static typing to dynamic lan- 
guages like Clojure, JavaScript, Racket, and Ruby. 

We anticipate several concrete lines of work that are needed to 
realize the above goals. First, we need to add support for references 
and imperative update, features common to most popular dynamic 
languages. Since every dictionary operation in an imperative lan- 
guage goes through a reference, we will need to extend the type 
system with flow-sensitive analyses, as in 12^1 and lfl3h . to precisely 
track the values stored in reference cells at each program point. Fur- 
thermore, to precisely track updates to dictionaries in the impera- 
tive setting, we will likely need to introduce some flow-sensitivity 
to the ty pe s ystem itself, adopting strong update techniques as in 
(H and lH. Second, our system treats strings as atomic constants. 
Instead, it should bepossible to incoiporate modern decision proce- 
dures for strings |15] to support logical operations on keys, which 
would give even more precise support for reflective metaprogram- 
ming. Third, we plan to extend our local inference techniques to au- 
tomatically derive polymorphic instantiations j25ll and use Liquid 
Types (2l]\ to globally infer refinement types. Finally, for dynamic 
languages, it would be useful to incorporate some form of staged 
analysis to support dynamic code generation 
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A. Metatheory 

This section deals with the formal properties of System D*. First, 
we provide some definitions that were omitted from the presenta- 
tion of System D in Sections[3]and|4] Next, we provide the complete 
definitions of stratified System D*. Finally, we specify the assump- 
tions and definitions specific to our refinement logic, and present 
the details of the proof. Compared to the proof outline in lsectiori~5l 
we prove the progress and preservation parts of System D* Type 
Soundness together, rather than with separate progress and Strati- 
fied Preservation theorems. 

A.l Additional System D Definitions 
A.l.l Operational Semantics 

The small-step operational semantics of System D expressions is 
parametrized on a function S that defines the behavior of constants 
c that are functions. Dictionary operations like has, get, and set 
are factored into the 5 function. As terms are in A-normal form, 
there is a single congruence rule, E-COMPAT. 



if S(c, w) is defined 



[E-Delta] 



c w c — y S(c, w) 
(Ax. e) w <-»• e[w/x] [E-App] 
let x = w in e <—¥ e[w/x] [E-Let] 
(XA. e) [T] ^ e [E-TApp] 
if true then ei else e 2 ei [E-IfTrue] 
if false then e\ else e 2 <— > e 2 [E-IfFalse] 



ei 



ei 



let x — ei in e 2 let x — e\ in e 2 



[E-COMPAT] 



A.1.2 Well-formedness 

We briefly supplement our discussion in lsection~4l 

Refinement Types. The well-formedness of formulas does not 
depend on type checking; all that is needed is the ability to syn- 
tactically distinguish between terms and propositions. We omit the 
straightforward rules for well-formed values. The important point 
is that a variable x may be used only if it is (bound) in V. Since our 
refinement logic is unsorted, all logical predicate and function sym- 
bols must be defined for all values in any model of the logic. Thus, 
ill-typed expressions like true + false may evaluate to nonstan- 
dard "error" values in such models. This means that, for example, 
{y | v > 0} is not the same as {u \ tag(v) = "Int" A v > 0} 
since the former may also include non-integer values. Such values 
never arise at run-time, as the types of our primitive operations and 
constants guarantee that they only consume and produce standard, 
non-error values. 

Datatype Definitions. To enable a sound subtyping rule for con- 
structed types in the sequel, we check that the declared variance an- 
notations are respected within the type definition. The VarianceOk 
predicate is defined as 

VarianceOk(A,+,T) iff (Uj Poles( J 4, +, T,)) C {+} 

VarianceOk( J 4,-,T) iff (U 3 - Poles(A,+,T.,)) C {-} 

VarianceOk^, =, T) always 

where Poles is a helper procedure that recursively walks formulas, 
type terms, and types to record where type variables occur within 



the types of the fields. Poles(yl, +, T) computes a subset of {+, -} 
that includes + (resp. -) if A occurs in at least one positive (resp. 
negative) position inside T. For each type variable, these polari- 
ties are computed across all field types in the definition and then 
checked against its variance annotation. After successfully check- 
ing that a type definition is well-formed, it is added to the globally- 
available type definition environment ty. For example, when check- 
ing the well-formedness of the type term C[T], we make sure that 
C is defined by testing for its presence in ^. 



- if 9 = + 
+ if9 = - 



Po\es(A,6,{u\p}) = Po\es(A,9,p) 



Poles( A, 9, P(!w)) 
Po\es{A,6,lw :: U) 
Poles(A, 6,p f\q) 
Po\es(A,9,p V q) 
Poles(yl, 6, -np) 



Poles(A, 9, U) 

Poles(A, 6,p) U Poles(4, 6, q) 
Poles(A, 6,p) U Poles(4, 9, q) 
Po\es(A,^6,p) 

= W 



Po\es(A,9,A) 
Po\es{A,9,B) = 
Poles(A,6», x:T! ->■ T 2 ) = Poles(A, -.0, 2i) U Po\es(A,6,T 2 ) 
Poles(A, 9, Null) = 



Po\es(A,9,C[T]) = U» 



Poles(A,6>, Ti) 
Poles(A, -iO, Ti) 
Poles(A+,r i ) 
U Poles(A,-,T) 



if 0i = + 
if 9i = - 

if 0i = = 



In the last case of this definition, *(C) = [9B]{ ■ ■ ■ }. 
A.2 Stratified System D* 

The complete definition of the System D* typing and subtyping 
relations in Figures [5] and [6] The only differences compared to 
the base system are that all typing and subtyping derivations are 
now indexed with an integer n, and the clause implication relation 
contains the new C-VALID-N rule. The well-formedness relations 
remain unchanged. 

A.3 Definitions and Assumptions 

We often use the following abbreviations for types and substitution 
into types. 

M = Hp} 

p(lw) = p[lw/v] 
{Tj(lw) = in\lw/u] 

Proposition (Refinement Logic). The refinement logic underlying 
the type system at level zero is the quantifier-free fragment of first- 
order logic with equality and the decidable theories listed below. 
Logical terms of a universal sum sort called Val include integers, 
booleans, strings, and dictionaries (finite maps from strings to val- 
ues). Expressions, formulas and type terms can be encoded in the 
logic as uninterpreted constructed terms. Function and type func- 
tion terms are pairs of formal parameters and expression terms. 

• (Theory: Uninterpreted Functions) 

• (Theory: Linear Arithmetic) 

• (Theory: Dictionaries) 



Type Checking 



r h n e :: S 



Subtyping 



r h„ c :: ty(c) 



[T-Const] 



r(x) = r 



[T-Var] 



r(x) = va s 



rh„i:: {v|v = 3:} 1 1 rh„i :: VA S 

r h„ tui :: Diet V h n i«2 :: Str V h n i«3 S 
T h n toi ++ {-W2 H> 1U3} :: \y \ v — wi ++ {to 2 H> 11)3}} 

rh„m :: Bool 
V,W = true h n ei :: S T, w = false h n e 2 S 

r h n if to then ei else e 2 :: S 1 
r h Ti T, x : Ti h n e :: T 2 



[T-VarPoly] 



[T-EXTEND] 



[T-IF] 



r h„ Ax. e :: {v \ v = Ax. e A z/ :: x : Ti T 2 } 

T h n Mi :: :: i:Tu T i2 } T h n w 2 :: Tji 

F h„ ioiiii2 :: Ti 2 [w 2 /a:] 

.4 ^ T r,ih„e::S 



[T-FUN] 



[T-App] 



T h n AA e :: VA S 

r h T r hn w :: VA S 
T h n to [T] :: lnst(5, A, T) 



[T-TFUN] 



[T-TApp] 



[T-FOLD] 



V*. r h Ti *(C) = [0A]{fjT'} 
Vj. r h„ Wj :: lnst(Tj, A T) 

Th„ C(«?) :: {i/|Fold(C,T,«j)} 

rh n e :: {v \ v :: C[T}} 

= — [T-Unfold] 

Th„ e :: {v | Unfold(C, T)} 

rhSi rh n ei :: Si r,i:5ih„e 2 :: S 2 
T h n let x = ei in e 2 :: S 2 

T h„ e :: S' T h n 5' C S rhS 

[T-Sub] 



[T-Let] 



rh„ e 



Figure 5. Type checking for System D* 



We use the following axiomatization of dictionaries that can be 
reduced to the theory of finite maps f2Ql . 

Vw. 

■ -^has(empty, w) 
Vu>i, U>2, W3. 

■ has(wi ++ {w2 W3},w 2 ) 

■ seZ(u>i ++ {iy 2 1— ¥ W3 } , W2 ) = W3 

■ EqMod(wi ++ {ui 2 n> WJ3}, Wi, W2) 

Vwi,W2,x,y. 

■ EqMod(wi,W2,x)Ax 7^ y =>• (has(wi, y) has(w2,y)) 

■ EqMod(wi,W2,x) Ai/t/^ (sel(wi,y) = sel(w2, y)) 

(Assumption: Tag Function) 

We assume the presence of a unary function symbol tag that 
maps values to strings. 



a- fresh pi=pi[x/u] p2=p 2 [x/v] 
Normalize^) = A l (q 1 ^> n) Wi. T,pi h n q t ^ n 
rh„ HM C {/,|p 2 } 

r h n Si c s 2 



[S-MONO] 



r h n VA 5i C VA S 2 
Clause Implication 



[S-POLY] 



rh n q. 



Valid([r] A?^r) X n h-[rlAq^r 

— — — [C-VALID] — — [C-VALID-N] 

n-„9^r r h n q =^ r 

3j. Valid([r] Aq^lwj :: U) Y,q\- n V <: Uj 



T h n q ee> Vi Iwi :: Ui 
Syntactic Subtyping 



[C-ImpSyn] 



r h n (/1 <: U 2 



T h„ T 2 i E Tn r,z:T 2 i h n T 12 C T 22 



T h n i:Tn ^T i2 <: x:T 2 i ^ T 22 
[U-Var] 



[U-Arrow] 



T\-„A<:A 



T h n TVkZZ <: C[T] 



— [U-NULL] 



<If(C) = [0A]{-.-} 
Vi. if 0i 6 {+, =} then T h„ Ti, C T 2l 
Vi. if 6), € {-, =} then T h„ T 2i □ gij 

r h n cpT] <: c*[Tb] 



[U-Datatype] 



Figure 6. Subtyping for System D* 



tag (true) = "Bool" 

tag(false) = "Bool" 

iap(n) = "Int" 

tag(Xx. e) = "Fun" 

iop(AA e) = "TFun" 

tag(u>i ++ {w 2 M> 103}) = "Diet" 

£ac/(C(«J)) = "Diet" 

tag(c) = "Fun" if c is a function 

• (Fact: Validity) 

We write Valid(p) to mean that, as usual, p is satisfiable in all 
interpretations. In the C-VALID rule, we appeal to a decision 
procedure to check whether Valid (p). 

• (Assumption: Boolean Values) 

We assume Valid (tag(w) = "Bool") iff w £ { true, false}. 

• f Fact: Free Variable Substitution ) 

If v appears free in p and q, 

then p => q implies p[w/x] =>• q[w/x] for all w. 

• (Fact: Uninterpreted Predicate Substitution) 

If P is an uninterpreted predicate symbol in p and q, 
thenp=> q implies p[P' / P] => q[P' / P]for all P'. 

Assumption (Constant Types). For every constant c £ Dom(ty), 
the following properties hold. 

1. (Well-formed). h ty(c). 



2. (Normal). 

ty(c) — {u | v — c A p} where either 

p = true or 

p=v :: x:Ti -+T 2 . 

3. (App). 

ifty(c) = {u\ v = c A v :: x:Ti -> T 2 }, 
then for all w' and n such that h„ m' :: Ti, 
S(c, w') is defined and h„ S(c,w') :: T^w'/x]. 

4. (Valid). 

Valid(ty(c)[c/i/]). 

In other words, we add these to the initial 
typing environment. 

Definition (Type Predicate Interpretation). The System D Interpre- 
tation at level n interprets type predicates as follows. 

• X n |= w '.: x : Tn — ► T12 iff \~ x : Tn — » T12 anrf either: 

1. w = Xx. e and 

x:Tn h n _i e :: Ti 2 ; or 

2. w = c, 

ty(c) = {u \ v = c A v :: x:T Q i — > T02}, an<7 
h n _i x- : T01 ->• T 2 <: X : Tn ->• T12. 

• I n |= u) :: A never. 

• In \= w :: Null iff 'w — null. 

»In\=W :: C[T] iff h T, *(C) = [M]{J?T}, and either: 

1. w = null; or 

2. w = C(w) and 

forallj, h n -iuij :: lnst(Tj, A, T). 

Assumption (Datatype Representation). This assumption requires 
that the implementation treats constructed data just like ordinary 
dictionaries. Let *(C) = \6A\{J?F}. 

If I,,^:: C[T], 

then X n \= tag(w) = "Diet" 

and X n \= Ajl\nst(T<,A,T)}(sel(w,fj)). 

A.4 Formal Properties 

To reduce clutter, we elide the well-formedness requirements of all 
expressions, formulas, types, type terms, typing environments, and 
type definitions mentioned in the lemmas and theorems that follow. 

1 Lemma (Inversion). 

1. IfF h n x:Tn -> T 12 <: x: T 21 -> T 22 , 

then T h„ T21 C Tn and F, x : T 21 h„ T12 E T 22 . 

2. //T h„ Ax. e :: S, 

T h n Ax. e :: {u — Xx. e A v :: x : Ti — > T 2 } 

Proof. By induction. Note that we have only listed the properties 
we will need to use. □ 

2 Lemma (Reflexive Subtyping). 

1. rh„p^»p 

2. T h„ [7 <: U 

3. T hn 5 E 5 

Proof. By mutual induction. □ 

3 Lemma (Transitive Subtyping). 

7. 7/T h„ p ^ q and T h„ g ^ r, f/zen rh„p^ r. 
2. //T h„ Ui <: U 2 and F h„ U 2 <: U 3 , then F h„ Ui <: U 3 . 



3. IfV h„ Si C S 2 and T h n S 2 E S 3 , tfien T h n Si E S 3 . 
Proof. By mutual induction. □ 

4 Lemma (Narrowing). Suppose Y h n S E S'. 

7. 7/T, i:S' h„p^ 5, rtew T, x: S h» p ^ g. 

2. 7/T, x : S' hn Ui < : U 2 , then V, x : S h„ Ui <: U 2 . 

3. IfT, x : S' h„ Si E S 2 , tfien T, x : S h n Si E S 2 . 

4. 7/T, x : S' h„ e :: Si, then F, x : S h„ e :: Si. 

Proof. By mutual induction. □ 

5 Lemma (Weakening). Suppose F = Ti, F 2 aw<7 T' « such that 
h r and either 

r' = ri,x:S,r 2 or r' = ri,p,r 2 or r' = ri,A,r 2 . 

7. 7/T h„ gi s> g 2 , fAen T' h„ gi ^ g 2 . 

2. 7/T hn Ui <: U 2 , then F' h„ Ui <: U 2 . 

3. IfF hn Si C 5a, fAen T' h n Si C S 2 . 

4. 7/T h„ e :: S, rten F' h n e :: S. 

Proof. By mutual induction. □ 

6 Lemma (Free Variables in Subtyping). Recall that the variable 
v can appear free in the formulas, type terms, and types mentioned 
in the outputs of the following derivations. Suppose Iw is a closed, 
well-formed value. 

1. IfF h„ p ^ q, then F h„ p[lw/u] ^ q[lw/v]. 

2. IfF hn Ui <: U 2 , then F h n U^lw/u] <: U 2 [lw/v]. 

3. IfF hn Si C S 2 , then F h„ S^lw/v] C S 2 [lw/u]. 

Proof. By mutual induction. The premise of the C-VALID-N case 
is X n \= [r] A p =>■ q. Since appears free in the implication, 
by Free Variable Substitution, X n \= [r] Ap[lw/u] => g[/ui/i/]. 
Thus, by C-VALID-N, T h„ p[lw/u] ^ g[Zw/^]. The rest of the 
proof is a straightforward induction. □ 

7 Lemma (Sound Variance). 
7. Suppose r h„ Ti C ^2- 

(a) If B appears only positively in T, 
thenF\- n lnst(T, 73, Ti) E lnst(T, B,T 2 ). 

(b) If B appears only positively in p, 
thenr\- n {\nst(p,B,Ti)} E {lnst(p,B,T 2 )}. 

2. Suppose r hn Ti E T 2 . 

(a) If B appears only negatively in T, 

thenF\- n lnst(T, 73, T 2 ) C lnst(T, B,Ti). 
(i) 7/ i? appears only negatively in p, 

thenV h„ {lnst(p,B,T 2 )} E {lnst(p, 73, Ti)}. 

5. Suppose r h„ Ti E T2 and F \~„T 2 C. T\. 
(a) ThenV h„ lnst(T, B, T 2 ) C lnst(T,73,Ti) 

a;jdr h n lnst(T,73,Ti) E lnst(T, B, T 2 ). 
fAJ fcrh„ {lnst(p,73,T 2 )} E {lnst(p,73,Ti)} 
andT h„ {lnst(p, 73, Ti)} E {lnst(p, 73, T 2 )}. 

Proof. The proofs of (1) and (2) are by mutual induction on types 
and formulas. The proof of (3) is a stand-alone induction on types 
and formulas. 

Proofof(la). 

LetT= {u\p}. 

The goal follows by IH (lb), since 

lnst({i/|p},73,Ti) = {^|lnst(p,73,Ti)}and 
lnst(Hp},73,T 2 ) = Hlnst(p,73,T 2 )}. 



Proof of (lb). 
Case: p = Iw. 

Trivial, since lnst(p, B, Ti) = lnst(p, B, T 2 ) = p. 
Cases: p = qi A q 2 , p = gi V 172 • 

By IH (lb), C-VALID, and S-MONO. 
Case: p = ^g. 

By IH (2b), C-VALID, and S-MONO. 
Case: p — Iw :: U . 

Subcase: U = B. 

By definition, lnst(p, B,Ti) = iT^lw). 
By definition, \nst(p,B,T 2 ) = lT 2 }(lw). 
By Free Variables in Subtyping, h„ T\[lw/v] E T2[^/f]- 
Thatis, h„ {[Ti](to)} C {Pa] (*«>)}■ 
Subcase: U = x : S\ —¥ S2. 

(Note that we are using Si and S2 for types.) 
LetSii = lnst(S , i,B,Ti) and S12 = Inst (A, B, T 2 ). 
LetS 2 i = lnst(S 2 ,B,r 1 )andS'22 = lnst(S 2 ,B,T 2 ). 
Since B appears only pos in T, it appears only neg in Si, 

by the well-formedness of the type definition and 

the definition of Poles. 
ByIH(2a), h„ S12 E Sn. 

Since B appears only pos in T, it appears only pos in 52- 

BylH(la), h„ S21 E S22. 

By Weakening, x: S21 h„ S21 E S22- 

By U-ARROW, 

h n X : Sn — > S21 <: x : S12 — > S 22 . 
By C-VALID, 

p :: x : Sn — > S21 h„ true ^> p :: x : Sn — > S21. 
By C-lMPSYN, 

h» p :: a; : — >■ S21 ^ p :: X : S12 -> S22. 
By S-MONO, 

h„ {p :: x:Sii — > S21} E {p a;:Si2 ->• S22}. 
Thatis, h„ lnst(T,B,Ti) C lnst(T, B, T 2 ). 

Subcase: J7 = C[S], where *(C) = [0A]{ ■ ■ ■ }. 

(Note that we are using S for a type.) 

Subsubcase: 9 = +. 

Let Si = lnst(S, B, Ti) and S 2 = lnst(S, B, T 2 ). 

Since B appears only pos in T, it appears only pos in S. 

BylH(la), h n Si E S 2 . 

By U-DATATYPE, h n C[Si] <: C[S 2 ]. 

By C-VALID, C-ImpSyn, and S-MONO, 
h n {p :: C[Si]} E {P ■■■■ C[S a )}. 
Subsubcase: 9 = -. 

Similar. 

Subsubcase: 9 = =. 

Since _B appears only pos in T, it cannot appear in S. 
Thus, lnst(T, B, Ti) = lnst(T,B,T 2 ) = T. 

Proof of (2a) and ( 2b ). Similar. 

Proof of (3). Straightforward induction. 

□ 

8 Lemma (Lifting). 

1. IfT h n p ^ q, then T h n+ i p ^ q. 

2. IfT h n [/1 <: U 2 , /ten T h n+i [/1 <: f/ 2 . 



5. //r h„ Si E S 2> then r h n+ i Si E s 3 . 

4. IfT hn e :: S, fto>n T h«+i e :: S. 

5. //In h^ p, fAen I„+i |= p. 

Furthermore, for each of the first four properties, the size of the 
output derivation is the same size as the original. 

Proof. By mutual induction. In the C-VALID-N case of (1), the 
conclusion follows by C-VALID-N after applying IH (5). The type 
predicate case for (5) follows from IH (4). □ 

9 Lemma (Strengthening). Suppose X n \= p. 

1. Ifp, T hn qi ^> 92, then T h„ qi ^ q 2 . 

2. Ifp, T h„ Ui <: U 2 , then T h n Ui <: U 2 . 

3. Ifp, T h„ Si E S 2 , then T h„ Si C S 2 . 

4. //p, T h„ e :: S, fAen T h n e :: S. 

Furthermore, for each property, the size of the output derivation is 
the same size as the original. 

Proof. By mutual induction. 

Proof of (1). 

x n \= [p, r] a qi g- g 2 

Case: C-VALID-N. p, T h„ gi ^ g 2 

By expanding the embedding, X n \= p A [T] A gi => §2- 
Thus, 2„ |= p =4> [T] A qi q 2 . 
Because of the assumption, I n \= [T] A qi q 2 - 
By C-VALID-N, T h„ qi ^ g 2 . 

Valid(|p,r] Agi =>g2) 

Case: C-Valid. p, T h„ gi ^ 52 

By Validity, X n \= |p, T] A ?1 =► ?2 . 

The rest of the reasoning in this case follows the previous case. 

3j.Valid([p,r]A g =»K- ::£/) 
Case: C-ImpSyn. p, T h„ g ^ V, Ivn :: [/< 

By C- VALID, T h„ g ^ fajj :: (7. 

By IH (2),r,gh„f/<: f/^. 

By C-ImpSyn, T h„ g ^ Vi Iwi :: f/j. 

Proof of (2), (3), and (4). Straightforward induction. 

□ 

The following lemma intuitively captures the relationship between 
the type system and the underlying refinement logic: if a closed 
value w can be given the type T with a derivation at level n, then the 
formula Pl(™) is true in the System D Interpretation at level n+1. 
This property plays a crucial role in the proof of Value Substitution. 
Notice that nothing is said about values that are assigned polytypes. 

Because the following lemma works only with the empty en- 
vironment, the Strengthening lemma is helpful for proving the 
C-ImpS YN and S-MONO cases, which have premises that use non- 
empty environments. 

10 Main Lemma (Satisfiable Typing). 

1. If h n p ^ q, then X n+1 \= p q. 

2. If h„ Ui <: U 2 , then X n+1 ^vv.Ux^v:: U 2 . 

3. If h„ {v |p} E {v\ q}, then X n+1 \= p q. 

4. If h„ w :: T, thenX n+1 \= l_Tj(w). 

In the first three properties, the variable v appears free in the 
implication. Thus, they are implicitly quantified over all values. 



Proof. By mutual induction on the size of derivations, not by struc- 
tural induction. The reason for this induction principle is that in the 
C-ImpSyn and S-MONO cases, subderivations are manipulated by 
Lifting and Strengthening (which preserve derivation size) before 
appealing to the induction hypothesis. 



Proof of (I). 



Valid(tr«e Ap => q) 



Case: C-VALID. h n p ee> q 

By Validity, I n \= true A p =>• q, and thus, I n \= p =>■ q. 
By Lifting, X n+ i \= p q. 

\= true A p ^ q 

Case: C-VALID-N. h„ p ^ q 

By Validity and Lifting, In+i \= p => q. 

3j. Valid([0] Ap Iwj :: U) pV- n U < 

Case: C-ImpSyn. 



h n p s> Vi luii :: Ui 



We assume I n +i |= p and will prove Xn+i (= Vi iiUj :: C/i. 

By C-VALID, h n p ^ Iwj :: [7. 

ByIH(l),X„+i ^p^ lWj :: U. 

Thus, I n+ i |= liVj :: U. 

By Lifting, p h„+i (7 <: t/j. 

By Strengthening, h n+ i [7 <: [7,. 

This last derivation is the same size as p h n U <: ?7j 

since Lifting and Strengthening preserve derivation size. 
Thus, we can apply the induction hypothesis. 
By IH (2), Tn+i \= v :: U v.-.Uj. 
Thus, I n +i \= luij :: Uj. 
Thus, X n+ i \= Vi luii :: Ui. 



Proof of (2). 



H„ T 21 C T n x:T 2 i h n T 12 C T 2: 



Case: U-ARROW. h n a; : Tn -> Ti 2 <: a; : T 21 -t T 22 

Let Ui = x : Tn -> Ti 2 and £/ 2 = a: : T 2 i -> T 22 . 

We assume |= v :: f/i and will prove I n +i |= f t7 2 . 

By Type Predicate Interpretation, there are two cases. 

Subcase: v = Xx. eanda;:Tn h n _i e :: T\ 2 . 

By Lifting, x:T n h„ e :: Ti 2 . 
By Narrowing, x : T 2 i h n e :: Tn. 
ByT-SUB,a;:r 2 i h„ e :: T 22 . 

Thus, by Type Predicate Interpretation, I n +i \= Xx. e :: Ui. 
Subcase: 

to = c, tj/(c) = = c A v :: i: Th — s> To 2 }, and 

h„_i x:T i ->■ T 02 <: i:Tn-> T12. 

By Lifting, h n x :T 01 -> T 02 <: x : Tn -> T 12 . 

By Inversion, h„ Tn C T i andx:Tn h„ T 02 C Ti 2 . 

By Transitive Subtyping, h„ T 2 i C Toi- 

By Narrowing, a; : T 2 i h n T 02 C Ti 2 . 

By Transitive Subtyping, a;:T 2 i h n T) 2 C T 22 . 

By U- ARROW, h n x:T i -^T 02 <: x:T 2 i ^T 22 . 

By Type Predicate Interpretation, I n +i \= c :: x : T21 — > T 22 . 

Case: U-VAR. Trivial. 



We consider the special case when there is exactly one type pa- 
rameter A with variance annotation 9. The type actuals are, there- 
fore, labeled Tn and T 21 . The reasoning extends to an arbitrary 
number of type parameters by a strong induction on the length of 
the sequence. 

Subcase: 9 = +. 

Consider an arbitrary wo such fhatl„+i |= too C[Tn]. 
By Type Predicate Interpretation, there are two cases. 
In one case, too = null, and trivially 

l n+1 \= null :: Cpk]. 
In the other case, too = C(w) and 

for all j, h n Wj :: lnst(Tj, A,Tn). 
By well-formedness of the type definition, A appears only 

positively in every Tj. 
By Sound Variance (1), 

h„ \nst{T$,A,T n ) C lnst(Tj,A,T 2 i). 
Uj_ ByT-SuB, \~ n Wj :: lnst(Tj, A,T 2 i). 

By Type Predicate Interpretation, X„+i |= C(to) :: C[T 2 i]. 

Subcase: 6 = -. Similar, using Sound Variance (2). 

Subcase: 9 = =. Similar, using Sound Variance (3). 

Proof of (3). Only the rule for monotypes applies. 

x fresh p = p[x/u] q = q[x/v] 
V(gu, qii) G Normalize(g'). p h n q u ^ qn 
Case: S-Mono. l~n{Hp} E { v I ?} 

Note that the alpha-renaming preserves satisfiability. 
So we assume I„+i |= p and then prove X n +i |= (?'■ 
By Strengthening on each premise, h n qii ^> q 2 i. 
Each of these derivations has the same size as the original. 
Thus, by IH (1) on each, I n +i |= qu => qu- 
Thus, X n+ i \= At qu => q 2i . 

Thus, by equivalence of normalized formulas, |= g'. 

Proof of (4). We only need to consider the rules that can derive a 
monotype T for a value to in the empty environment. 

Case: T-CONST. By Constant Types (Valid). 

Case: T-EXTEND. Trivially, since [y — w)[w/v] = w = w. 

U = x:T 1 -+T 2 x:T x h n e :: T 2 

Case: T-FUN. h n Aa;. e :: {v = \x. e Av :: (7} 

By Type Predicate Interpretation, T n +i \= Xx. e :: U. 
Furthermore, by Validity, I n +i \= Xx. e = Xx. e. 



Case: U-NULL. 



By Type Predicate Interpretation. 



Case: U-DATATYPE. 



*(C) = [0A]{f:T'} 

Vi. if 0, 6 {+, =} then h„ Ti, C T 2l 

Vi. ifgi 6 =} then h n T 2i □ 7\j 

h„ C*[TY] <: C[Tb] 



*(C) = [9A]{f:T'_} _ 
Wj. h n Wj :: Inst^-.yl.T) 



Case: T-FOLD. 



h„ C(to) :: {^|Fold(C,T,w)} 

We consider each of the components of the formula from Fold. 
By Validity, T n +i \= C(w) ^ null. 
By Type Predicate Interpretation, X„+i |= C(to) :: C[T]. 
By Datatype Representation, + i |= tag(C(w)) — "Diet" 
andX n+ i |= Aj sel(C(w), fj) — Wj. 



h„TO :: {v\v::C[T\} 



Case: T-UNFOLD. h„ to :: {v Unfold(C,T)} 
By IH (3). 2^+i ^w::C\T]. 

The goal follows by Type Predicate Interpretation and 
Datatype Representation. 

h n to :: T h n T C T 
Case: T-SUB. h„ w :: T 



[T-Unfold] 



ByIH(3),X„ +1 h IT'IH => [T](to). 
ByIH(4),X n+1 |= [T'](to). 
Thus,X n+1 |= P1H. 

□ 

In the following lemma we lift substitution to judgments in the 
obvious way. For example, we write (r h„ e :: S)[w/x] to mean 
r[w/a;] \- n e[w/x] :: S[w/x]. 

11 Main Lemma (Stratified Value Substitution). Let h n to :: S. 

1. Ifx :S,Fh n p^q, then (F h„+i p ^ q)[w/x]. 

2. Ifx:S,T\- n Ui <: U 2 , then {F h„+i Ui <: U 2 )[w/x]. 

3. Ifx:S,F h„ Si C S2, (r h„+i Si C S2)[to/x]. 

4. Ifx:S,T h n e :: S', *en (F h„+i e :: S')[w/4 

Proof. By mutual induction. In the C-VALID and C-VALID-N 
cases, we will distinguish between whether S is a monotype or 
a polymorphic type scheme. In all other cases, this difference will 
not affect the reasoning. The T-VAR case is interesting because 
singleton types must be preserved after substitution. 

Proof of (1). Recall that we use the notation p(x) to mean p[x/u\ 
and [T](x) to mean [T][x/z/]. Furthermore, we lift this to |r](x) 
in the obvious way. 

X n j= {x:S,r}Ap^> q 

Case: C-VALID-N. i:S,rh„p^g 

Subcase: S = T, 

Thus, Xn \= [T](x) A [r](x) A p(x) =J> q(x). 

Thus, Zn \= lTj(x) => [r](i)Ap(i) => q(x). 

By Lifting, X n+1 |= |T](x) [r](x) A p(x) =► <?(x). 

By Satisfiable Typing, X n+1 |= [T](tu). 

Thus, J n+ i |= [r][tu/x] Ap[w/x] =>• g[to/x]. 

By C-Valid-n, r[to/x] h n+ i p[w/x] ^> g[w/a;]. 

Subcase: S = VA S'. 

Thus, X n |= true A |T](a:) A p(x) =>■ 5(35). 

Thus,I n |= [ri(x) Ap(x) => q(x). 

Thus,X ?l |= [r]j[to/x] Ap[w/x] =>- q[w/x]. 

By Lifting, X n+1 |= [r][io/x] Ap[w/i] => g[w/x]. 

By C-Valid-n, r [10/35] hn+i p[to/x] ^ <?[to/x]. 

VaIid([x:S,r] Ap g) 

Case: C- Valid. x:S,Y \- n p ^ q 

Subcase: S = T, 

Thus, Valid (pi (a;) A |r](x) A p(x) q(x)). 

Thus, J n |= [Tj(x) A [r](x) A p(x) q(x). 

The rest of the reasoning follows the C-VALID-N subcase. 

Subcase: S = VA S'. 

Thus, Valid(trtte A |r](x) Ap(x) =>■ q(x)). 

Thus, Xn \= true A \f\(x) A p(x) => q(x). 

The rest of the reasoning follows the C-VALID-N subcase. 

3j. Valid([x:S,r] Ap Ivjj :: U) 
x:S,r,pr- n U <: U 3 

Case: C-ImpSyn. x:S,T \- n p ^ Wi Ivii :: Ui 

By C- VALID, x : S, T h„+i p ee^ Zto.; :: U. 
ByIH(l),r[w/x] h n+ i p[to/x] ^ Zw.,[to/x] :: tf[to/x]. 
By IH(2),r[w/x],p[TO/x] h n+ i [/[to/x] <: t/j [to/a;]. 
By C-ImpSyn, F[w/x] \- n+1 p[w/x] ^ (V< toj :: (/^[to/x]. 



Proof of (2). Straightforward induction. 



Proof of (3). Straightforward induction, appealing to the equisatis- 
fiability of normalized formulas in the S-MONO case. 

Proof of (4). 

Case: T-CONST. 

By T-CONST, F[w/x] h„+i c :: ty(c). 
By Constant Types (Well-formed), h ty(c), 

so tj/(c) has no free variables. 
Thus, ty(c)[w/x] = ty(c). 
Also, c[w/x] = c, which concludes the case. 

(x:S,T)(y) = T 

Case: T-VAR. x : S, T h n y :: {v\v = y) 

Subcase: i/y. 

By substitution on environments, schemes, types and 

formulas, (T[w/x])(y) = T[w/x]. 
By T-VAR, T[w/x] h„+i y :: {v\v = y}. 
This concludes the subcase since 

y[w/x] = y and {v = y}[w/x] = {u = y}. 

Subcase: x = y. 
Note that x[to/x] = to and {V = x}[to/x] = {u = to}. 
Subsubcase: w — z. 

Impossible, since the typing environment is empty. 
Subsubcase: to = wi ++ {u>2 H W3}. 

Trivial, by T-EXTEND. 

Subsubcase: w = c. 

By T-CONST, r[iu/x] h„+i c :: ty(c). 

By Constant Types (Normal), ty(c) — \y — c A p}. 

By C-VALID and S-MONO, 

T[w/x] h„+i {v = c A p} C = c}. 
By T-SUB, T[w/x] h„+i c :: = c}. 

Subsubcase: to = A2. eo. 

By Inversion, h„+i :: \y = to A v :: [/}. 
By C-VALID and S-MONO, 

h n+ i = w A v :: 17} C = w}. 
ByT-SUB, r-„+i to :: = to}. 
By Weakening, r[to/x] h„+i w :: {y = to}. 

Subsubcase: to = AA eo. 

Impossible, since T is a monotype. 

(x:S,r)(;/)=VASo 

Case: T-VarPoly. x : S, F h„ y :: VA So 

By substitution on environments, schemes, types and 
formulas, (r[TO/x])(y) = (VA S )[to/x]. 

Subcase: 1/5. 

Since x is a term variable, (VA So)[to/x] is a polytype. 
By T-VARPOLY, T[to/x] h„+i y :: (VA S )[w/x]. 
This concludes the subcase, since y[w/x] = w. 

Subcase: x = y. 

Thus, S = VA S . 

Since h S, x does not appear in S. 

Thus, So[to/x] = So- 

The goal follows from T-VarPoly. 

x:S,F,y:Ti h„ e :: T 2 
U = y.Tx^Ti 

Case: T-FUN. x:S, F h„ e :: ji/ = eAi/:: [/} 



Note that in this case, e = Xy. eg. 

Bym(4),r[w/x],y:T 1 [w/x] <r n+1 e [w/x] :: T 2 [w/x]. 
By T-FUN, 

T[w/x] \-„+i e[w/x] :: {v = e[w/x] A v :: U[w/x]}. 
Thus, r[w/x] \~ n +i e[w/x] :: {y = e A v :: J7}[to/a;]. 

i:S,rh„ Mi :: {v :: x :Tn Ti 2 } 
i:5,ri-„ w 2 :: Tn 
Case: T-APP. i:S,rh„ wi 11)2 :: Tis[wa/y] 

Let r' = r[w/x], w'i = toi [w/x], w' 2 = wa[w/x], 

T{ x = T n [w/x], and T{ 2 = T 12 [w/x], 
ByIH(4),r' h„+ito'i :: {u ::y:Tu ->Ti 2 }[w/x]. 
Thus, T' h n+ i toj :: {v :: y:T[ x -+ T[ 2 }. 
ByIH(4),r' h„+i w' 2 :: T[ x . 
By T-APP, T' h n+1 w[ w' 2 :: T[ 2 [w' 2 /y\. 
Now we expand T[ 2 [w' 2 / y] to T\ 2 [w /x] [w 2 /y][w/x]. 
Since w and w 2 are closed values, and x and y are distinct, this 

is the same as T12 [to 2 /v][ w / x ] [w/x]. 
Furthermore, this is (Ti 2 [w 2 /y])[w/x]. 
Finally, we note that w[ w 2 = (toi w 2 )[w/x]. 
Thus, the derivation from T-APP does indeed satisfy the goal. 

x:S,Fh„ e :: S" x:S,T\- n S" C 5' 
Case: T-SUB. i:S,rh„e :: 5" 

ByIH(4),r[w/:r] h n+ i e[w/x] :: S"[w/x]. 
ByIH(3),r[w/x] h n+1 S"[w/x] :: S'[w/x]. 
By T-SUB, F[w/x] h n+ i e[w/x] :: S'[w/x]. 

Cases: T-LET, T-If, T-TFun, T-TApp, T-EXTEND. 
By IH on the premises and original rule to conclude. 

Cases: T-Fold, T-Unfold. 

By IH on the premises and original rule to conclude. 



□ 



In the following lemma we lift instantiation to judgments in the 
obvious way. For example, we write lnst((T h n e :: S),A,T) to 
mean lnst(F, A, T) h n e :: \nst(S,A,T). 

12 Lemma (Type Substitution). Let h T. 

1. p,rh n p^q, then lnst((F h„ p ^ g), A,T). 

2. h„ t/i <: f/ 2 , then lnst((F h„ f/i <: U 2 ),A, T). 

3. IfA,T\- n Si C 5 2 , rten lnst((F h n Si C 5 2 ),A,T). 

4. //i,rh„e :: S, then lnst((F h n e :: S),A,T). 

Proof. By mutual induction. Even (1) is straightforward, since type 
variables in the environment play no rule in the embedding of 
formulas into the logic (they are embedded as true). □ 

13 Lemma (Canonical Forms). Suppose h n w :: S. 

1. If S = Bool, then vu = true or w = false. 

2. IfS = {u I v :: x:Tx ->■ T 2 }, tfzen e/f/ier 
(a) w — Xx. e and x :T± \-„ e :: T 2 , or 

(fcj w — c and for all w such that h n w :: Ti, 5(c,w') is 
defined and \-„ S(c,w') :: T 2 [w'/a;]. 

5. 7/S = VA. S', then w = XA. e and Ah„e :: 5'. 

Proof of (1). By Satisfiable Typing, I n +i |= tag(w) = "Bool". 
By Boolean Values, to is either true or false. □ 

Proof of (2). By Satisfiable Typing, l n+1 (= to :: x:Ti -> T 2 .The 
goal follows by Type Predicate Interpretation and Constant Types 
(App). □ 



Proof of (3). By induction on the derivation. We consider only the 
rules that can derive a polytype. 

Case: T-TFUN. Immediate. 

Case: T-TAPP. Impossible, since w is a value. 

Case: T- VarPoly. Impossible, since the environment is empty. 

h„ w :: So h„ So E VA. S 
Case: T-SUB. h n to :: VA. 5' 

The subtyping derivation can only conclude by S-POLY. 
From its premises, So = VA. S" where A h„ S" C S'. 
By IH, to = XA. e and ih„e :: S" . 
By T-SUB, A h„ e :: S'. 

□ 

We are now ready to prove the following type soundness theorem 
that combines progress and preservation. Soundness of the basic 
type system, which is the system at level zero and is used for 
typechecking source programs, follows as a corollary. 

14 Theorem (System D* Type Soundness). 

If h n e :: S, then either e is a value or e ^ e and h n +i e :: S. 

Proof. By induction on the typing derivation. 
Cases: T-Var, T- VarPoly. 

Impossible, since the typing environment is empty. 
Cases: T-CONST, T-EXTEND, T-FUN, T-TFUN, T-FOLD. 

Immediate, since e is a value. 

Cases: T-Unfold. 

By Satisfiable Typing and Type Predicate Interpretation, e is a 
value. 

h n to :: Bool 
w — true h n ei :: S 
w = false hn e 2 :: S 
Case: T-lF. h n if w then ei else e 2 :: S 

By Canonical Forms, there are two cases. 
Subcase: w = true. 
By E-lFTRUE, e = ei. 

Valid(true = true), so by Strengthening, h n ei :: S. 
By Lifting, h n+ i ei :: S. 

Subcase: to = false. 

By E-lFFALSE, e' = e 2 . 

Valid(false = false), so by Strengthening, h n e 2 :: S. 
By Lifting, h n+i e 2 :: S. 

h n toi :: :: x:T\\ — > T 12 } 
h n w 2 :: Tn 

Case: T-APP. h„ Wi w 2 :: Ti 2 [to 2 /x] 

By Canonical Forms, there are two cases. 

Subcase: 101 = Xx. eo anda;:Xii h n eo :: Ti 2 . . 

By Value Substitution, h„+i eo[to 2 /x] :: Ti 2 [w 2 /x]. 

This concludes the subcase, since by E-APP, e' = eo[w 2 /x]. 

Subcase: 101 = c. 

Since h n w 2 :: Tn, we are also given that S (c, w 2 ) 

is defined and h„ <5(c, to 2 ) :: Ti 2 [u; 2 /a;]. 
By Lifting, h n+i <5(c,to 2 ) :: Ti 2 [to 2 /x]. 
This concludes the subcase, since by E-DELTA, e = <5(c,to 2 ). 



\- n w' :: VA 5" 
Case: T-TAPP. h„ w [T] :: lnst(S ,/ , A, T) 

By Canonical Forms, w' = AA eo and A h n eo :: S'. 

By Type Substitution, h n e :: Inst (5', A, T). 

By Lifting, h n+ i e :: lnst(5", ,4, T). 

This concludes the case, since by T-TAPP, e = eo. 

h Si h„ ei :: Si 
h 52 a; : 5*i h n e2 :: S*2 
Case: T-LET. h„ let x = ei in ei :: S2 

By the IH, there are two cases. 
Subcase: ei is a value w. 
By E-LET, e' = e 2 [w/x]. 

By Value Substitution, h n +i e2[w/x] :: S2[w/x]. 

Since h &, x does not appear free in S2, so 52 [w/^] = 5*2. 

Subcase: ei e'i and h„+i e[ :: S. 

By E-COMPAT, e' = let a; = e'± in e 2 . 

By Lifting, s : S h„+i e 2 :: S2. 

By T-LET, h n+ i let x = ei in e 2 :: S 2 . 

h n to :: S 1 ' h„ S C 5 
Case: T-SUB. h n tu :: S 

By IH, Lifting, and T-SUB. 

□ 

15 Corollary (System D Type Soundness). 

If ho e :: S, then either e diverges or e =— y* wand h* w :: 5*. 

Proof. Follows from System D* Type Soundness. □ 



B. Algorithmic Typing 

A type checker for System D cannot directly implement the 
declarative type system for a couple of reasons. First, the typing 
rules are not syntax-directed because of T-SUB and T-UNFOLD, 
which can apply to any expression e, and C-ImpSyn, which non- 
deterministically refers to a type term U. Second, the syntax of 
values lacks type annotations, so the premises of rules like T-FUN, 
T-LET, and T-lF manipulate types that cannot be inferred by the 
syntax of the expression being checked. 

In this section, we define an algorithmic version of the type 
system. First, we extend the syntax of the language with optional 
type annotations for binding constructs and for constructed data. 
Next, we show how to implement the non-deterministic C-ImpS YN 
rule. Then, we define an algorithmic type system without the non- 
deterministic T-SUB and T-UNFOLD rules. To eliminate the for- 
mer, we derive unique types and then add explicit subtyping checks 
in the typing rules that require them. To eliminate the latter, we 
eagerly attempt to unfold the types of bindings in anticipation 
of where T-UNFOLD might be needed. Furthermore, although we 
could require that all binding constructs and constructed data be 
annotated with types, this would lead to redundant and tedious type 
annotations. Instead, we define a bidirectional type system in the 
style of [25] that locally infers type annotations where possible. 

B.l Syntax 

We extend the syntax of System D as follows. 



Algorithmic Subtyping 



r ; u h Si e s 2 



\x\T. e 

C[T](W) 

let x : S = ei in e 2 



Values 

annotated function 
annotated constructed data 

Expressions 

annotated let-binding 



B.2 Subtyping 

The algorithmic subtyping rules for System D are shown in 
|Figure 7| The derivation rules of the algorithmic subtyping, clause 
implication, and syntactic subtyping relations are analagous to their 
counterparts in in the declarative system, except that they include 
an additional input U, which is a set of type terms U. To begin 
the discussion, this additional input U should be ignored, and the 
procedure Extend (T, x, S) can be assumed to extend a type envi- 
ronment in the usual way, that is, F, x : S; we will return to both of 
these issues shortly. 

Type Extraction. We now show how CA-ImpSyn implements 
the non-deterministic C-ImpSyn rule. First, we define the proce- 
dure TypeTerms that traverses the environment V and syntactically 
collect all of its type terms U. 

TypeTerms(r, a; : | p}) = TypeTerms(r) U TypeTerms(p) 
The interesting case for formulas is for type predicates: 

TypeTerms(£ui :: U) = {U} 

Notice that types contained within U are not collected, only "top- 
level type terms" are. 

The CA-lMPSYNrule then uses the following MustFlow proce- 
dure to compute which type terms U out of all possible type terms 
in the environment (ignoring the " \ W" part for now) are such that 
the solver can prove w : : U is true for all values w of type T 

MustFlow(r,T,W) = {U€U' | Valid([r,x:TJ =>• x :: U) } 
where U' = TypeTerms(r) \ U 
and x is fresh 



x fresh p[ — pi [x/ u] p' 2 = p 2 [x/v] 
Normalize^) = A;(g; ^ n) 
Vi. U h qt ^ n 

r;Wh{H?i} c Wp2} 

T; U \- Si E & 



[SA-MONO] 



Y; U\- VA Si E VA S 2 
Algorithmic Clause Implication 

Valid([rj Aq^-r 



[SA-POLY] 



F; U h q 



T-Uhq 



[CA- Valid] 



3 j. U' = MustFlow(r, {v | v = lwj},U) 
3U £U'. T,q; U\JU' h U <: Uj 
r;Wh(j^V, lw % :: Ui 

Algorithmic Syntactic Subtyping 



[CA-ImpSyn] 



r ; u h Ui <: u 2 



r-uh r„i c Tn 

Extend (T, 21, T 2 i); U \- T 12 C T 22 
T; Whn:Tn ^ T 12 <: x 2 :T 21 T 22 

[UA-Var] 



T; U h A <: A 



T;U\- Null <: C[T] 



[UA-Arrow] 

[UA-Null] 



9(C) = [BA]{f:T} 
Vi. ifOi e {+, =} then F; U h Tu C T 2i 
Vi. if 6i € {-, =} then F;Wr T 2l C Tu 

F-UY- C[J\] <: cin] 



[UA-Datatype] 



Figure 7. Algorithmic subtyping for System D 

That is, CA-ImpSyn tries all type terms U that C-ImpSyn might 
possibly refer to. 

Termination. We now turn to the question of whether algorithmic 
subtyping terminates. Because the subtyping, implication, and syn- 
tactic subtyping relations are mutually defined, we may worry that 
it is possible to construct an implication query (and hence a subtyp- 
ing obligation) which is non-terminating. Indeed, a naive approach 
to deciding implications over type predicates using the above strate- 
gies (without considering the U parameters) may not terminate. In 
the following, we write judgments without the U parameters to see 
what goes wrong when they are not considered. 
Consider the environment 

T = y : Top, x : {y \ v = y A v :: U} 

where U = a : {y \ v :: b: \y \ v = y} — >• Top} — > Top 

and suppose we wish to check that 

r h true ^y:: x:{v\v = y} -> Top. (5) 

C A- VALID cannot derive this judgment, since the implication 

[r] A true y :: x : {v | v = y} — > Top 

is not valid. Thus, we must try to derive |Equation~5l by CA- 
ImpSyn. Type extraction derives that y :: U in F, so the remaining 
obligation is 

F h U <: x:{v\ v = y} -¥ Top. 



Because of the contravariance of function subtyping on the left- 
hand side of the arrow, the following judgment must be derivable: 

r\-{v\v = y} C {u | v :: b : {v \ v = y} -> Top}. 

After S A-MONO substitutes a fresh variable, say v' , for v in both 
types, this reduces to the clause implication obligation 

r, v' = y h true ^> v :: b: {u \ v = y} — ¥ Top. 



Alas, this is essentially |Equation 5| so we are stuck in an infinite 
loop! We will again extract the type U for y (aliased to v here) and 
repeat the process ad inifinitum. 

This situation arises because we are allowed to invoke the rule 
CA-ImpSyn infinitely many times. Then it must also be the case 
that CA-ImpSyn extracts a single type term from the environment 
infinitely often, since there are only finitely many in the environ- 
ment. Thus, to ensure termination, we make the restriction that 
along any branch of a subtyping derivation, a type term may be 
extracted from the environment at most once. This is the purpose 
of the set U that is propagated through subtyping judgments; the 
MustFlow procedure excludes from consideration any type terms 
in the set U of already-used type terms. Notice that in the CA- 
ImpSyn rule, the results of the call to MustFlow are included in 
the already-used set of the syntactic subtyping judgment. 

B.3 Bidirectional Type Checking 

In this section, we define an algorithm for type checking programs 
where type annotations for binding constructs and constructed data 
expressions may or may not be provided. Following work on local 
type inference I25I1 . our type checking algorithm is split into two 
mutually-dependent parts: a type synthesis relation V h e D> S that 
given an expression e, a type environment T, and no information 
about the expected type of e attempts to synthesize, or derive, a 
well-formed type S; and a type conversion relation T h e < S 
that, in addition to e and T, takes a type S that is required of e, 
and checks whether or not e can indeed be given type S. Thus, S 
is an output of a synthesis judgment but an input to a conversion 
judgment. We will highlight some of the more interesting cases of 
type checking relations after dealing with two issues. 

Inconsistent Type Environments. Recall that the type extraction 
procedure collects the type terms U such that Valid(|[r, x : T\ =S> 
x :: U). If the environment T,x:T happens to be inconsistent, then 
all such implications will be valid. As we will see, our typing rules 
for function application will depend on type extraction returning 
exactly one syntactic arrow, which will not be the case in an in- 
consistent environment. This is a precision issue that we avoid by 
simply not performing type extraction when in an inconsistent envi- 
ronment. To this end, both the synthesis and conversion algorithms 
start off by checking whether the environment is inconsistent, and 
if it is, they trivially succeed. 

[TS-False] [TC-False] 
Valid([r] => false) Valid([r] => false) 



F h e E> {false} 



T h e < S 



These rules are sound because when the environment is inconsis- 
tent, the underlying implications can be discharged by CA- VALID 
anyway. 

Unfolding. Unlike T- S UB , uses of T-UNFOLD cannot be factored 
into other typing rules, since we cannot syntactically predict where 
it is needed. It is not sufficient, for example, to unfold type defini- 
tions only at uses of variables (that is, in the typing rule for vari- 
ables). To demonstrate, consider the function 

let get_hd x = get x "hd" 



and an attempt to assign it the type 

getJid :: x:{v ^ null A v :: List[Top]} -> {v = sel(x, "hd")}. 

Say we unfold the type List[Top] at the use of x, when it is passed 
to the get function. By the definition of Unio\d(List, Top), we 
obtain 

x / null => (tag(x) = "Diet" A has(x, "hd") A has(x, "tl")) 

which, together with the assumption that x 7^ null, allows the call 
to get to typecheck. Then, to check the subsequent call with ar- 
gument "hd" , we require that has (x, "hd" ) . The unfolded formula 
is sufficient to prove this, but it is no longer in the environment of 
logical assumptions, since it was not recorded in the type environ- 
ment. 

Languages like ML leverage pattern matching to determine 
exactly when to unfold type definitions. We do not have this option, 
however, since our core language does not include a syntactic form 
for unpacking constructed data. Instead, we eagerly try to unfold 
type definitions every time a variable is added to the environment. 
We define a procedure Extend that, in addition to extending a type 
environment as usual, uses type extraction to determine whether the 
variable has a constructed type and, if it does, unfolds and records 
its type definition. 



Extend(r,x-,T) = T,x:T, A 



c[T']eu 



Unfold(C, T')\x/u] 



where U = MustFlow(r, {u = x}, 0) 
Extend (r,x,VA S) = T,x:VA. S 

Constants and Variables. We now consider some of the algorith- 
mic typing rules. For non-function values, the synthesis rules are 
similar to the declarative typing rules, whereas the conversion rules 
invoke synthesis and then call into subtyping to check the synthe- 
sized type against the goal. 



[TS-CONST] 



T h c > ty(c) 

[TS-Var] 

I» = T 

r 1- x > {v = x} 



[TC-CONST] 

n-ct>S' F; h s' c s 

r P c < s 

[TC-VAR] 

r 1- x > t' r ; h t' c t 
r h x < t 



Functions. The synthesis rule for annotated functions is straight- 
forward. The best we can do when the function binder x is not 
annotated is try to typecheck the body assuming that x has type 
Top. 



T h Ti Extend(r, x, Ti) h e > T 2 
T h Xx-.Tl e > {v wx-.T-l ->■ T 2 } 



[TS-FunAnn] 



Extend(r,x, Top) h e D> T 2 
T h Ax. e > {v :: x : Top -> T 2 } 



[TS-FunBare] 



When checking whether a function, annotated or not, can be con- 
verted to a particular type T, we require that T syntactically have 
the form \y :: U} where U is an arrow. This seems to be a reason- 
able source-level requirement, but it could be loosened if needed. 



ri-T r J h Ti c r 

Extend(r,2;,Ti) h e D> T 2 
f h Xx:T. e < {u :: x:T x ->■ T 2 } 



[TC-FunAnn] 



Extend(r,2:,ri) h e > T 2 
r h Xx. e < {u :: x :Ti -s-T 2 } 



[TC-FunBare] 



Function Applications. The cases for application are the most 
unique to our setting. To synthesize an application, we must be able 
to synthesize a type T\ for the function W\ and use type extraction 
to convert T\ to a syntactic arrow. The procedure MustFlow can 
return an arbitrary number of syntactic type terms, so we must 
decide how to proceed in the event that 7\ can be extracted to 
multiple different arrow types. To avoid the need for backtracking 
in the type checker, and to provide a semantics that is simple for 
the programmer to understand and use, we consider an application 
wi W2 to be well-typed if there is exactly one syntactic arrow that 
is applicable for the given argument to 2 . 

Determining what is "applicable" separates into two cases. In 
the case that we can synthesize a type T 2 for W2, we use the 
following procedure that succeeds if there is exactly one arrow in 
the set U of type terms with a domain that is a supertype of T2 . 



FilterByArgTyp(r,W,r 2 ) 

I a-: Tn — > T12 



fail 



if x : Tn -> T12 is the only U G U 

such that V; h T 2 C T11 
otherwise 



The first synthesis rule for application uses this procedure to 
derive an output type for the call. (We write parentheses around the 
last premise, because it is not needed; it is implied by the successful 
FilterByArgTyp call. We include the premise in the rule for clarity.) 

r h wi > Ti r h w 2 > t 2 

U = MustFlow(r,Ti,0) 
x:Tu ->■ Ti 2 = FilterByArgTyp(r,W,T 2 ) 

(T; h T 2 C Tn) 



T h Wi W2 E> Ti 2 [w2/x] 



[TS-APPl] 



In the case that we cannot synthesize a type for W2, we use the 
following procedure that succeeds if there is exactly one arrow in 
U with a domain type that W2 can be converted to. 

FilterByArgVal(r,W,w 2 ) = 

( x:Tn->-Ti2 if a-: Tn -> T12 is the only U G U 
I such that r h w 2 < Tn 

I fail otherwise 

The second synthesis rule for application uses this procedure to 
derive an output type for the call. 

r h w 1 > Ti 

U = MustFlow(r,Ti,0) 
x:T n -^T 12 = FilterByArgVal(r,W,iu 2 ) 

(rhic 2 < Tn) 

ri-mi w 2 E> T 12 [w 2 /x] 

Type conversion for an application can proceed in two ways, if 
either the type of the function or argument can be synthesized. The 
first case, when the function type can be synthesized to an arrow, is 
similar to TS-APP2 with an additional subtyping check. 

rhtui > Ti 

U = MustFlow(r,Ti,0) 
a;:Tn -)■ T 12 = FilterByArgVal(T, U, to 2 ) 

(rhi» 2 < Tn) 

V: fd\-T 12 [w 2 /x] □ T 



[TS-APP2] 



rhiDiiB 2 < t 



[TC-APPl] 



In the second case, when we can synthesize a type T 2 for the 
argument, we combine T 2 with the goal T to infer a plausible arrow 
type for the function. Notice that we use a dummy formal parameter 
x, since we have no (reasonable) way of computing where x might 
have appeared in T before substituting W2 for x. 



r h W2 > T 2 x fresh 
ri-ioi < {v\ v :: x:T 2 -> T} 

r h wi w 2 <t 



[TC-APP2] 



If-expressions. We can synthesize a precise type for if-expressions 
by tracking the guard predicates in the output type. Type conversion 
for if-expressions is straightforward. 

r h to < Bool 
F, w = true h ei E> {v \ pi} 
r, w — false h e 2 > \y \ p 2 } 
q = (w — true =>■ p± A w = false p 2 ) 



r h if to then ei else e 2 l> | g} 

r h to < Bool 
T, to = true h ei < T 
T,to = false h e 2 < T 



[TS-IF] 



r h if to then ei else e 2 <l T 



[TC-IF] 



Let-expressions. The rules for let-expressions share a similar 
structure. The choice whether to use synthesis or conversion on 
the equation expression ei depends on whether there is an annota- 
tion S or not. The choice for the body expression e 2 depends on 
the kind of derivation for the overall let-expression. Whenever a 
let-binding contains an annotation S, we must check that it is well- 
formed. The synthesis rules for both kinds of let-bindings must 
also check that the synthesized type T is well-formed in T, since 
we need to ensure that synthesized types are always well-formed in 
their environment. 

[TS-LetAnn-1] 

rhS r h ei< S Extend (r, x, S) h e 2 D> T ThT 
T h let x : S = ei in e 2 > T 

[TC-LetAnn] 

rhS r h ej < S Extend(r, x, S) h e 2 < T 

r h let x : S = ei in e 2 < T 
[TS-LetBare-1] 

r h ei > S Extend(T, 1, S 1 ) h e 2 E> T ThT 
T h let x = ei in e 2 I> T 
[TC-LetBare] 

r h ei t> S Extend (r, a;, 5) h e 2 < T 
T h let a; = ei in e 2 <] T 



Because the syntax of System D is A-normal form, programs 
will contain many let-expressions. Ideally, our algorithmic type 
rules will deal well with bare let-expressions well to avoid an over- 
whelming and redundant annotation burden. The TS-LetBare-1 
rule does not, however, successfully synthesize types in common 
situations where we would expect it to. We will show three prob- 
lematic examples and then incorporate a simple technique that sup- 
ports them. 

First, consider the function 

let get_f (x:{tag(v)="Dict" A has(v, "f ")}) = 
get x "f" 

In A-normal form, this function might be written as 

let get_f (x:-[tag(v) = "Dict" A has(v,"f")» = 
let a = get x in 
let b = a "f" in 
b 



Notice that the function binder is annotated but the let-binders are 
not. It seems reasonable to expect that the annotation on x would 
be sufficient for type synthesis to derive the type 

get_f :: x:{Dict{v) A has(u, "f")} ->■ {sel(x, "f")} 

but it does not. Consider an attempt to apply TS-LetBare-1 for 
the let-expression that binds b. At that point, type synthesis can 
derive the type T = {y = sel(x, "f")}forthe equation expression 
a "f ". Then, in the type environment extended with b : T, TS-VAR 
synthesizes the singleton type {y — b} for the body expression. 
But this type is, of course, not-well formed in the type environment 
without the binding for b, so the TS-LetBare-1 rule fails. This is 
quite unfortunate, since the TS-VAR rule will be used extensively, 
and clearly there is a type that we could have used instead of 
{y = b}, namely, the type stored for b in the environment! 

As a second problematic situation, consider the following vari- 
ation of the previous example. 

let maybe_get_f (x:Dict) = 

if mem x "f" then get x "f" else 

In A-normal form, this function might be written as 

let maybe_get_f (x:Dict) = 
let a = mem x in 
let b = a "f" in 
if b then 

let c = get x in 

c "f" 
else 



Again, we have a problem applying the TC-LetBare-1 rule to 
the let-expression that binds b. The type synthesized for the equa- 
tion a "f " is T = {Bool(p) A (y = true <S> has(x, "f"))}. To 
synthesize the type of the body, the culprit this time is the TS-lF 
rule, which derives the type {b = true =>■ v = sel(x, "f " ) A b = 
false => v = 0} that refers to b. We observe that the type T indi- 
cates that it is a boolean flag that records the property has(x, "f "), 
so in this case, we would like to replace the problematic body type 
with {has(x, "f") = true => v — sel(x, "f") A has(x, "f") = 
false =>■ v = 0}. Furthermore, we might expect to be able to 
play this trick quite often, since the shape of T - {p — true => 
v — sel(x, "f " ) A p = false =4> v = 0} for some formula p - is 
the same as the return type of several common primitive functions, 
including has and =. 

The third and final problematic situation that we consider origi- 
nates with a small twist on the previous example. 

let another_maybe_get_f (x:Dict) = 
let a = mem x in 
let b = a "f" in 
let b' = b in 
if b' then 

let c = get x in 

c "f" 
else 



This time, the boolean condition used in the if-expression goes 
through one more level of indirection, namely, the variable b ' . 
Thus, when processing the b ' let-expression, the type synthesized 
by TS-lF for the body expression is {b' = true =>• v = 
sel(x, "f ") A b' = false =>• v — 0} The type for b ', which is 
\y = b}, does not, however, match the special shape of boolean 
flags from before. The trick we can play this time is to simply 
replace b' with b, and derive {b = true =>■ v — sel(x, "f") A 



b = false =>■ v = 0} for the body expression. This type is 
well-formed, and when considered as the body expression for the 
enclosing let-expression that binds b, will be further rewritten using 
the technique for eliminating singletons to the type {has(x, "f") = 
true => v = sel(x, "f") A has(x, "f") = false =>■ v — 0} 

We encapsulate these three simple heuristics in a procedure 
Elim and use it to define the following more precise synthesis rule 
for bare let-bindings. 



T\-ei>S Extend(r,s,5') h e 2 > T 
T' = E\\m(x,S,T) (T h T 1 ) 

T h let x — ei in e 2 E> T' 



[TS-LetBare-2] 



The procedure Elim(a:, S,T) procedure takes a variable x whose 
equation expression has been synthesized to type S, and the type T 
for the body expression, and attempts to remove occurrences of x. 
When the procedure succeeds, the resulting type is guaranteed to be 
well-formed in the environment without x. It starts by processing 
the top-level refinement predicate. 

E\\m{x,S,{v\p}) = {z/|Elim(x,5,p)> 

The first non-trivial case is for equality predicates that correspond 
to the singleton types synthesized by TS-Var. 



Elim(a;, S, v = x) = 



p if S = \y | p} 
fail otherwise 



The other non-trivial case is for equality predicates that equate 
variables with boolean values, as the TS-lF rule does. The two 
cases correspond to whether S matches the canonical shape of 
boolean flags or whether S is a singleton type. 

Elim(a;, S, x = true) = 

(p if S — {Bool(u) A (y = true <4> p}) 

y = true if S — {v = y} 
fail otherwise 

Elim(:r, S, x = false) = 

(— >p if S = {Bool(u) A (y = true o p}) 

y — false if S = {v = y} 
fail otherwise 

The rest of the cases recursively process the formula. 



Elim(a:, S, F(lw)) = F(Elim(x, S, lw)) 
E\\m(x, S, lw :: U) = Elim(x, S, lw) :: E\\m(x, S, U) 
Elim(s:, S,p A q) = Elim(a;, S,p) A Elim(ir, S, q) 

As one final heuristic, we attempt to rewrite occurrences of x that 
do not appear in the two kinds of equality predicates that we have 
built support for. The following is the non-trivial case for logical 
values that replaces the variable x when its type is a singleton. 

Elimfx S x) — { y if 5" = {i^ = y} 
^ ' ' ' ~ 1 fail otherwise 

E\\m(x,S,y) = y 

If variable elimination fails, we can synthesize Top as a last resort. 



r h ei > S Extend(r, x,S) \- e2 > T 
T h let x = ei in e 2 C> Top 



[TS-LetBare-3] 



Since synthesis annotated let-expressions must also check that the 
output type is well-formed, we define two additional rules TS- 
LetAnn-2 and TS-LetAnn-3 that are analagous to the conver- 
sion rules. 



Constructed Data. We briefly discuss how we infer type param- 
eters that are omitted in constructed data expressions. We extend 
the syntax of type definitions as follows. For every type variable A 
of a type definition for constructor C, we allow exactly one occur- 
rence of A to be marked, written *A, in the definition of C. When 
attempting to synthesize a type for unannotated constructed data, 
we use the positions of marked type variables to match the cor- 
responding positions in the types of the value arguments that are 
used to construct the record. For simplicity, we infer omitted type 
parameters for constructed data only when all type parameters are 
omitted. Therefore, we require that either zero or all of the type 
parameters in a definition are marked. 

For example, we update the List definition as follows to use the 
type of the "tl" field to infer the type parameter: 

type List{+A]{ u hd" :{v :: A}; "tV':{v :: List[*A]}} 

Therefore, if the variable xs has type List[Int], then List(l,xs) 
is well-typed; we infer the type argument hit, which is a supertype 
of {y = 1}. Notice that putting the marker for A in the type of the 
"hd" field would lead to less successful inference, since the type 
of an element added to a list will often be more specific than the 
type of the rest of the list, and so the inferred type parameter would 
be too specific. For example, List(l, xs) would not be well-typed, 
since the type {y — xs} is a subtype of List[Int], but is not a 
subtype of List[{v = 1}]. 

Remaining Rules. We omit the definition of the remaining syn- 
thesis and conversion rules since they do not illuminate any new 
concerns. Although the techniques that we have employed so far 
would allow us to, we do not synthesize type instantiations. 

B.4 Soundness 

We now consider how derivations in the algorithmic type system 
relate to derivations in the declarative type system. We use a proce- 
dure erase to remove type annotations from functions, let-bindings, 
and constructed data because the syntax of the declarative system 
does not permit them. 

16 Proposition (Sound Algorithmic Typing). 

1. IfT; U\~p^> q, then F h p s> q. 

2. IfT; U\-Ui<:U 2 , then V h Ui <: U 2 . 

3. IfT; WhSi C S 2 , then V h Si C S 2 . 

4. //Th e > S, then V h erase(e) :: S. 

5. IfT h e < S, then T h erase(e) :: S. 

Proof sketch. We consider the key aspects of the development of the 
algorithmic type system and provide an intuition for why they are 
sound. To prove that algorithmic clause implication is sound with 
respect to declarative clause implication, we must consider CA- 
ImpSyn and its use of the type extraction procedure. It is easy 
to see that uses of MustFlow can be converted into derivations 
by C-VALID, since it depends on the validity of logical implica- 
tions. Proving that algorithmic subtyping and syntactic subtyping 
are sound with respect to their declarative counterparts goes by in- 
duction on their derivation rules, which correspond one-to-one. 

To prove that type synthesis and type conversion are sound with 
respect to declarative typing, there are a few points to consider. 
The first is the initial check for an inconsistent type environment 
that TS-FALSE and TC-FALSE perform. It is simple to show that 
in the declarative system any judgment is derivable when the type 
environment is inconsistent. The proof is a straightforward induc- 
tion, using the C- VALID rule to check that an inconsistent envi- 
ronment means all clause implications can be proven valid. Sec- 
ond, we can show that the Extend procedure, which uses type 
extraction to unfold type definitions, can be replaced with uses 
of T-UNFOLD. Third, we can show that in the TC-LetBare- 



2 rule, when E\\m(x, S,T) successfully returns a type T', it is 
well-formed in F and, furthermore, since the heuristics employed 
soundly replace equality predicates, T \- e 2 :: T' . Finally, we can 
show that the subtyping premises used in the algorithmic rules can 
be replaced with uses of T-SUB. 

B.5 Implementation 

We have implemented a prototype checker for System D in ap- 
proximately 2,000 lines of OCaml, using Z3 |@] to discharge SMT 
queries. A noteworthy, but unsurprising, optimization in our im- 
plementation compared to the algorithmic system presented here 
is that the environment of logical assumptions is maintained incre- 
mentally. We add and remove assertions to and from the logical 
environment whenever the type system manipulates the type envi- 
ronment, so that by the time the CA-VALID rules needs to check 
Valid([r] A p q), the formula [r] is already in the background 
assumptions of the environment; only Valid(p =>• g) needs to be 
discharged. 



C. Examples 

In this section, we present the original, unadapted source code 
corresponding to the noted examples in[TJand[2] 

C.l Introduction 

The introduction references the following function from the Dojo 
Javascript library, version 1.6.1 ll3Tll : 

160 

"_base/_loader/loader.j s" 
193 d._onto = function(arr, obj, fn){ 
if(!fn){ 
arr.push(obj); 
}else if(fn){ 

var func = (typeof fn == "string") ? obj[fn] : fn; 
arr.push(function(){ func.call(obj); }); 

} 

200 } 

170 

C.2 Overview 

The toXML example is adapted from the Python 3.2 standard li- 
brary: 



Lib/plistlib.py 

111 class DumbXMLWriter: 

def __init__(self, file, indentLevel=0, indent="\t"): 
self.file = file 
self.stack = [] 

self.indentLevel = indentLevel 
self.indent = indent 

def beginElement(self , element): 
self. stack.append( element) 
120 self.writeln("<%s>" % element) 

self.indentLevel += 1 

def endElement(self, element): 
assert self.indentLevel > 

19{ 

assert self.stack.pop() == element 
self.indentLevel — = 1 
self.writeln("</%s>" % element) 

def simpleElement(self, element, value=None): 
130 if value is not None: 

value = _escape(value) 

self.writeln("<%s>%s</%s>" % (element, value, element)) 
else: 

self.writeln("<%s/>" % element) 

20c 

def writeln(self, line): 
if line: 

# plist has fixed encoding ofutf—8 
if isinstance(line, str): 
wo line = line.encodefutf— 8') 

self.file. write(self.indentLevel * self.indent) 
self. file, write(line) 
self.file. write(b'\n') 

class PlistWriter(DumbXMLWriter): 

def __init__(self, file, indentLevel=0, indent=b"\t", writeHeader=l): 
if writeHeader: 

file.write(PLISTHEADER) 
150 DumbXMLWriter init__(self, file, indentLevel, indent) 



def write Value(self, value): 
if isinstance(value, str): 

self. simpleElement("string" , value) 
elif isinstance(value, bool): 

# must switch for bool before int, as bool is a 

# subclass of int... 
if value: 

self. simpleElementC 'true") 
else: 

self.simpleElement("false") 
elif isinstance(value, int): 

self.simpleElement("integer", "%d" % value) 
elif isinstance(value, float): 

self.simpleElement("real", repr( value)) 
elif isinstance(value, diet): 

self. writeDict( value) 
elif isinstance(value, Data): 

self. writeData( value) 
elif isinstance(value, datetime.datetime): 

self.simpleElement("date", _dateToString( value)) 
elif isinstance(value, (tuple, list)): 

self.writeArray(value) 
else: 

raise TypeError("unsupported LJ type: tJ %s" % type(value)) 

def writeData(self , data): 
self.beginElement("data") 
self.indentLevel — = 1 

maxlinelength = 76 — len(self.indent.replace(b"\t", b"J' * 8) * 

self.indentLevel) 
for line in data.asBase64(maxlinelength).split(b"\n"): 
if line: 

self.writeln(line) 
self.indentLevel += 1 
self.endElement("data") 

def writeDict(self, d): 
self.beginElement("dict") 
items = sorted(d.items()) 
for key, value in items: 
if not isinstance(key, str): 

raise TypeErrorf'keys jnust^be^strings") 
self.simpleElement("key", key) 
self, write Value(value) 
self.endElementf'dict") 

def writeArray(self, array): 
self.beginElement("array") 
for value in array: 

self, write Value(value) 
self.endElement("array") 



